Thinking beyond audio: Augmenting headphones for everyday digital interactions

Thinking beyond audio: Augmenting headphones for everyday digital interactions

This research was accepted by and received a Best Paper Award during ACM Designing Interactive Systems (DIS) 2023, which is dedicated to advancing the field of user-centered system design.

Headphones are traditionally used to provide and manage audio experiences through physical controls and a range of sensors. Nonetheless, these controls and sensors have remained confined to audio input and output functionality, such as adjusting the volume or muting the microphone. Imagine if headphones could transcend their role as mere audio devices. 

Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control. In our paper, “Beyond Audio: Towards a Design Space of Headphones as a Site for Interaction and Sensing,” we share a vision that explores this potential.

By using sensors such as microphones, proximity sensors, motion sensors, inertial measurement units (IMUs), and LiDARs, headphone designers can explore new avenues of input and interaction. The fact that headphones are worn on a person’s head allows for a wide range of applications, such as following head movements, body postures, and hand gestures. Furthermore, as wearable devices, headphones have the potential to provide wearers with context-rich information and enable more intuitive and immersive interactions with their devices and environment beyond traditional button-based controls.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Potential scenarios for sensor-enhanced headphones 

To explore this concept further, we propose augmenting headphones with additional sensors and input widgets. These include: 

  • IMUs to sense head orientation
  • Swappable sets of input controls  
  • A range-sensing LiDAR that enables the sensing of hand gestures

By incorporating these capabilities, we envision a wide range of applications where headphone input acts as a bridge between the person wearing it and their environment and enable more efficient and context-aware interactions among multiple devices and tasks. For example, a headphone could assist people with applications like video games or help manage interruptions during a video call.  

Let’s explore some scenarios to illustrate the potential of our headphone design concept. Consider a person engaged in a video call with teammates when they are suddenly interrupted by a colleague who approaches in person. In this situation, our headphones would be equipped to detect contextual cues, such as when the wearer rotates their head away from a video call, signaling a shift in attention. In response, the headphones could automatically blur the video feed and mute the microphone to protect the wearer’s privacy, as shown in Figure 1. This feature could also communicate to other participants that the wearer is temporarily engaged in another conversation or activity. When the wearer returns their attention to the call, the system removes the blur and reactivates the microphone.

Figure 1: Two videos side-by-side showing the headphones in a context-aware privacy-control scenario. On the left, there is an over-the-shoulder view of a wearer participating in a video call on a laptop. As he looks away from the call, the laptop screen changes color, and the application is muted, depicted by a mute icon overlayed on the video. As the wearer looks back at the screen, it becomes unblurred and a unmute icon is overlaid on the image, indicating the mute has been turned off. On the right, we see the laptop screen previously described.
Figure 1. These videos illustrate a context-aware privacy control system implemented during a video conference. In this scenario, the wearer temporarily disengages from the video conference to engage in an in-person conversation. After a predefined period, the system detects the wearer’s continued attention directed away from any known device, taking into account the environment context. As a result, privacy measures are triggered, including video blurring, microphone muting, and notifying other participants on the call. Once the wearer re-engages with the screen, their video and microphone settings return to normal, ensuring a seamless experience.

In another privacy-focused scenario, imagine a person simultaneously conversing with multiple teammates in separate video call channels. Our headphone design allows the wearer to control to whom their speech is directed by simply looking at their intended audience, as shown in Figure 2. This directed speech interaction can extend beyond video calls and be applied to other contexts, such as sending targeted voice commands to teammates in a multiplayer video game.

DIS 2023 - Figure 2: Two videos side-by-side showing the wearer controlling where his input is being sent among a multitude of devices. On the left, a video shows an over-the-shoulder view of a wearer interacting with a monitor and aptop while wearing headphones. There are two separate video calls on each screen. As the wearer turns from one screen to another, a large microphone icon appears on the screen at which the wearer is looking, and a muted microphone icon is shown on the other screen.

The video on the right shows an over-the-shoulder view of a wearer interacting with a laptop while wearing headphones. The laptop screen shows a video game and four circular icons on each corner depicting the other players. The user looks at the bottom left of the screen, which enlarges the icon of the teammate in that corner, and the wearer starts to speak. The wearer then looks at the top-right of the screen, and the teammate in that corner is highlighted while the wearer speaks.
Figure 2. Headphones track the wearer’s head pose, seamlessly facilitating the distribution of video and/or audio across multiple private chats. They effectively communicate the wearer’s availability to other participants, whether in a video conferencing scenario (left) or a gaming scenario (right).

In our paper, we also demonstrate how socially recognizable gestures can introduce new forms of audio-visual control instead of relying solely on on-screen controls. For example, wearers could interact with media through gestural actions, such as cupping their ear towards the audio source to increase the volume while simultaneously reducing ambient noise, as shown in Figure 3. These gestures, ingrained in social and cultural contexts, can serve as both control mechanisms and nonverbal communication signals.

DIS 2023 - Fig 3 - image showing gestural controls for volume
Figure 3. Top: Raising the earcup, a commonly used gesture to address in-person interruptions, mutes both the sound and the microphone to ensure privacy. Bottom: Cupping the earcup, a gesture indicating difficulty hearing, increases the system volume.

Additionally, we can estimate the wearer’s head gaze through the use of an IMU. When combined with the physical location of computing devices in the wearer’s vicinity, it opens up possibilities for seamless interactions across multiple devices. For instance, during a video call, the wearer can share the screen of the device they are actively focusing on. In this scenario, the wearer shifts their attention from an external monitor to a tablet device. Even though this tablet is not directly connected to the main laptop, our system smoothly transitions the screen sharing for the wearer’s audience in the video call, as shown in Figure 4.

DIS 2023 - Figure 4: Two videos side-by-side showing a headphone wearer among a multitude of devices controlling which screen is shared in a video call. The video on the left shows an over-the-shoulder view of a person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. A video call is in progress on the laptop, and the wearer is giving a presentation, which appears as a slide on the attached monitor. As the wearer turns from the laptop screen to the monitor, the presentation slide appears on the shared laptop screen. The video on the right shows an over-the-shoulder view of the person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. We see the wearer looking at the monitor with a presentation slide, which is mirrored on the laptop screen. He then turns from the monitor to the tablet, which has a drawing app open. As he does this, the drawing app appears on the shared laptop screen. The wearer uses a pen to draw on the tablet, and this is mirrored on the laptop. Finally, the wearer looks up from the tablet to the laptop, and the laptop screen switches to the video call view with the participants’ videos.
Figure 4. A wearer delivers a presentation using a video conferencing tool. As the wearer looks at different devices, the streamed video dynamically updates to display the relevant source to participants.

Finally, in our paper we also show the use of embodied interactions, where the wearer’s body movements serve to animate a digital representation of themselves, such as an avatar in a video call, as shown in Figure 5. This feature can also be implemented as a gameplay mechanism. Take a racing game for instance, where the wearer’s body movements could control the vehicle’s steering, shown on the left in Figure 6. To extend this capability, these movements could enable a wearer to peek around obstacles in any first-person game, enhancing the immersion and gameplay experience, shown on the right in Figure 6.

DIS 2023 - Figure 5: Two videos showing a headphone wearer controlling an avatar in a video call through head movements. The video on the left shows an over-the-shoulder view of a headphones wearer interacting with another participant on the call. The video on the right shows a wearer using a touch control to depict an emotion in his avatar.
Figure 5. Left: Headphones use an IMU to monitor and capture natural body movements, which are then translated into corresponding avatar movements. Right: Touch controls integrated into headphones enable wearers to evoke a range of emotions on the avatar, enhancing the user experience.
DIS 2023 - Figure 6: Two videos showing a wearer playing a video game while leaning left and right. These movements control his character’s movements, enabling him to duck and peek around walls.
Figure 6. Leaning while wearing the headphone (with an integrated IMU) has a direct impact on game play action. On the left, it results in swerving the car to the side, while on the right, in enables the player to duck behind a wall.

Design space for headphone interactions 

We define a design space for interactive headphones through an exploration of two distinct concepts, which we discuss in depth in our paper.

First, we look at the type of input gesture for the interaction, which we further classify into three categories. The gestural input from the wearer might fall under one or more of these categories, which we outline in more detail below and illustrate in Figure 7.

  • Touch-based gestures that involve tangible inputs on the headphones, such as buttons or knobs, requiring physical contact by the wearer
  • Mid-air gestures, which the wearer makes with their hands in close proximity to the headphones, detected through LiDAR technology
  • Head orientation, indicating the direction of the wearer’s attention
DIS 2023 - Figure 7: List of three stylized images showing the three main kinds of gestures we look at: touch, head orientation, and mid-air gestures.
Figure 7. Sensor-enhanced headphones can use touch-based gestures (left), head orientation (middle), or mid-air gestures (right) as types of input.

The second way that we define the design space is through the context within which the wearer executes the action. Here, design considerations for sensor-enhanced headphones go beyond user intentionality and observed motion. Context-awareness enables these headphones to understand the wearer’s activities, the applications they are engaged with, and the devices in their vicinity, as illustrated in Figure 8. This understanding enables the headphones to provide personalized experiences and seamlessly integrate with the wearer’s environment. The four categories that define this context-awareness are comprised of the following: 

  • Context-free actions, which produce similar results regardless of the active application, the wearer’s activity, or the social or physical environment.  
  • Context that is defined by the application with which the wearer is interacting. For example, are they listening to music, on a video call, or watching a movie?  
  • Context that is defined by the wearer’s body. For example, is the wearer’s gesture close to a body part that has an associated meaning? Eyes might relate to visual functions, ears to audio input, and the mouth to audio output. 
  • Context that is defined by the wearer’s environment. For example, are there other devices or people around the wearer with whom they might want to interact?
DIS 2023 - Figure 8: Diagram showing the different levels of context we look at: context free, application, user's body, and the environment.
Figure 8. The system uses diverse contextual information to enable personalized responses to user input.

Looking ahead: Expanding the possibilities of HCI with everyday wearables  

Sensor-enhanced headphones offer a promising avenue for designers to create immersive and context-aware user experiences. By incorporating sensors, these headphones can capture subtle user behaviors, facilitating seamless interactions and enhancing the wearer’s overall experience.  

From safeguarding privacy to providing intuitive control mechanisms, the potential applications for sensor-enhanced headphones are vast and exciting. This exploration with headphones scratches the surface of what context-aware wearable technology can empower its wearers to achieve. Consider the multitude of wearables we use every day that could benefit from integrating similar sensing and interaction capabilities into these devices. For example, imagine a watch that can track your hand movements and detect gestures. By enabling communication between sensor-enhanced wearables, we can establish a cohesive ecosystem for human-computer interaction that spans across applications, devices, and social contexts.

The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.

Read More

Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures

Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures

This research paper was accepted by the 50th EATCS International Colloquium on Automata, Languages and Programming (ICALP 2023), which is dedicated to advancing the field of theoretical computer science.

Image that includes the blog title, ”Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures.” It also shows the ICALP 2023 logo, which is in Paderborn, Germany, and an image of the first page of the published paper. The background is a subtle abstract design.

In the rapidly evolving landscape of cloud computing, escalating demand for cloud resources is placing immense pressure on cloud providers, driving them to consistently invest in new hardware to accommodate datacenters’ expanding capacity needs. Consequently, the ability to power all this hardware has emerged as a key bottleneck, as devices that power datacenters have limited capacity and necessitate efficient utilization. Efficiency is crucial, not only to lower operational costs and subsequently prices to consumers, but also to support the sustainable use of resources, ensure their long-term availability, and preserve the environment for future generations.

At the same time, it is of the utmost importance to ensure power availability for servers, particularly in the event of a power device failure. Modern datacenter architectures have adopted a strategy to mitigate this risk by avoiding the reliance on a single power source for each server. Instead, each server is powered by two devices. Under normal operations, a server draws half its required power from each device. In the event of a failover, the remaining device steps in to support the server’s full power demand, potentially operating at an increased capacity during the failover period.

Two diagrams depicting the power allocation of three servers (1, 2, and 3) to three power devices (a, b, and c). The first diagram shows the allocation in normal operations with server 1 taking half of its power from device a and half from device b, server 2 being split half and half on devices a and c, and server 3 on devices b and c. The second diagram shows the failover due to device c becoming unavailable; now device a must support the full power of server 2, and device b the full power of server 3, server 1 is still taking power from both devices a and b.
Figure 1: This diagram depicts the power allocation of three servers (1, 2, and 3) to the power devices (a, b, and c) that serve them, during both normal operations and in a failover scenario. The height of each power device represents its capacity, and the height of each server within those power devices shows its power consumption. During failover, the servers have fewer devices from which to draw power, resulting in increased energy consumption from the resources available.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Challenges of optimizing power allocation

In our paper, “Online Demand Scheduling with Failovers,” which we’re presenting at the 50th EATCS International Colloquium on Automata, Languages and Programming, (ICALP 2023), we explore a simplified model that emphasizes power management to determine how to optimally place servers within a datacenter. Our model contains multiple power devices and new servers (demands), where each power device has a normal capacity of 1 and a larger failover capacity, B. Demands arrive over time, each with a power requirement, and we must irrevocably assign each demand to a pair of power devices with sufficient capacity. The goal is to maximize the total power of the assigned demands until we are forced to reject a demand due to lack of power. This helps us maximize the usage of the available power devices.

One crucial aspect of this model is the presence of uncertainty. We assign capacity for each demand without knowing future needs. This uncertainty adds complexity, as selection of device pairs for each demand must be carefully executed to avoid allocations that could hinder the placement of future demands. Figure 2 provides an illustration.

Two diagrams, each showing four power devices a, b, c, and d, and demands that have per-device power requirement ¼ each. The first diagram is labelled “bad assignment” and has two demands that take their power from a and b, and two demands from c and d. The second diagram is labelled “good assignment” and has six demands total, each on a different pair of devices.
Figure 2: This example shows four power devices a, b, c, and d, with failover capacity B=1, and six demands that arrive sequentially, with a per-device power requirement of ¼ each. Suppose four demands have arrived so far. The example on the left represents a bad assignment that cannot accept the remaining two demands. This is because if we placed an additional demand, say, on device a, its failover capacity would be exceeded if b failed. On the other hand, a good assignment, depicted on the right, allows for the placement of all six demands.

The example in Figure 2 suggests that we should spread the demands across device pairs. Otherwise, pairs with large loads could have a big impact on the remaining devices should one device fail. On the other hand, there is a danger in spreading out the demands too much and not leaving enough devices free, as shown in Figure 3.

Two diagrams, each showing four power devices a, b, c, and d, and demands that have per-device power requirement either a small amount epsilon or 0.5. The first diagram is labelled “bad assignment” and has six demands of power epsilon that are each assigned on a different pair of devices. The second diagram is labelled “good assignment” and has six demands of power epsilon assigned to the devices a and b, and one demand of power 0.5 assigned to the devices c and d.
Figure 3: This example involves four power devices, labeled a, b, c, and d, each with failover capacity B=1. The scenario also includes seven demands that arrive sequentially, with the first six requiring little power per device (epsilon = 0.01 units of power, for example), and the last requiring 0.5 units of power. In (a), when the small demands are distributed across pairs, the final demand cannot be accommodated because the failover capacity is exceeded. However, by grouping the small demands on a single pair of power devices, as shown in (b), all demands can be successfully accommodated.

In analyzing the examples in Figures 2 and 3, it becomes clear that striking the right balance is crucial. This entails effectively distributing demands across pairs to minimize any consequences resulting from failovers while also ensuring the availability of an adequate number of devices to meet future demand. Attaining the optimal allocation avoids prematurely ending up with an unassignable demand.

To tackle this challenge, we developed algorithms that are guaranteed to efficiently utilize the available power resources. In fact, our allocations are provably close to optimal, even without upfront knowledge of the demands. Our algorithms essentially conquer the inherent uncertainty of the problem.

Optimizing for the worst-case scenario

Our first algorithm takes a conservative approach, protecting against the worst-case scenario. It guarantees that, regardless of the sequence of demand requests, it will utilize at least half of the power when compared with an optimal solution that has prior knowledge of all demands. As we show in the paper, this result represents the optimal outcome achievable in the most challenging instances.

To effectively strike a balance between distributing demands across devices and ensuring sufficient device availability, the goal of this algorithm is to group demands based on their power requirements. Each group of demands is then allocated separately to an appropriately sized collection of devices. The algorithm aims to consolidate similar demands in controlled regions, enabling efficient utilization of failover resources. Notably, we assign at most one demand per pair of devices in each collection, except in the case of minor demands, which are handled separately.

Optimizing for real-world demand

Because the previous algorithm prioritizes robustness against the worst-case demand sequences, it may unintentionally result in underutilizing the available power by a factor of ½, which is unavoidable in that setting. However, these worst-case scenarios are uncommon in typical datacenter operations. Accordingly, we shifted our focus to a more realistic model where demands arise from an unknown probability distribution. We designed our second algorithm in this stochastic arrival model, demonstrating that as the number of demands and power devices increases, its assignment progressively converges to the optimal omniscient solution, ensuring that no power is wasted.

To achieve this, the algorithm learns from historical data, enabling informed assignment decisions based on past demands. By creating “allocation templates” derived from previous demands, we learn how to allocate future demands. To implement this concept and prove its guarantee, we have developed new tools in probability and optimization that may be valuable in addressing similar problems in the future.

The post Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures appeared first on Microsoft Research.

Read More

Renovating computer systems securely and progressively with APRON

Renovating computer systems securely and progressively with APRON

This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research.

Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage devices containing important executables and data become invalid, the entire system is affected. Numerous events can jeopardize the validity of computer systems or the data stored in them, such as malicious attacks like ransomware; hardware or software errors can corrupt a system, and a lack of regular maintenance such as patch installations can cause a system to become outdated. While the ideal scenario would be to create a flawless computer system that prevents such invalid states from occurring, achieving this perfection may prove challenging in practice.

Cyber-resilient system and recovery

A cyber-resilient system is a practical approach for addressing invalid system states. This resilient system effectively identifies suspicious state corruption or preservation by analyzing various internal and external signals. If it confirms any corruption, it recovers the system. In our previous work, which we presented at the 40th IEEE Symposium on Security and Privacy, we demonstrated the feasibility of unconditional system recovery using a very small hardware component. This component forcefully resets the entire system, making it execute trusted tiny code for system boot and recovery when no authenticated deferral request is present.

However, existing recovery mechanisms, including our previous work, primarily focus on when to recover a system rather than how. Consequently, these mechanisms overlook the efficiency and security issues that can arise during system recovery. Typically, these mechanisms incorporate a dedicated recovery environment responsible for executing the recovery task. Upon system reset, if the system is found to be invalid, as illustrated in Figure 1, the recovery environment is invoked. In this scenario, the recovery environment fully restores the system using a reference image downloaded from a reliable source or a separate location where it was securely stored.

There are two diagrams in Figure 1. The first depicts a situation where boot code, which is executed on system power on or reset, recognizes some corrupt parts of an operating system. Then, the boot code executes a recovery environment to fully recover all corrupt parts and resets the system. The second diagram depicts a situation after full recovery and reset. The boot code now finds no problem in the operating system and executes it.
Figure 1: System boot with a normal recovery.

Unfortunately, performing a full system recovery leads to prolonged system downtime because the recovery environment is incapable of supporting any other regular task expected from a computer system. In other words, the system remains unavailable during the recovery process. Moreover, choosing to download the reference image only serves to extend overall downtime. Although using the stored image slightly relieves this issue, it introduces security concerns, as the stored image might be outdated. One can argue that a full recovery can be circumvented by inspecting each file or data block for validity and selectively recovering only the affected ones. However, this delta recovery approach is lengthier than a full recovery due to the additional calculations required for determining differences and the inefficient utilization of modern, throughput-oriented block storage devices.

Secure and progressive system renovation

In our paper “APRON: Authenticated and Progressive System Image Renovation,” which we are presenting at the 2023 USENIX Annual Technical Conference (USENIX ATC 2023), we introduce APRON, a novel mechanism for securely renovating a computer system with minimal downtime. APRON differs from conventional recovery mechanisms in a crucial way: it does not fully recover the system within the recovery environment. Instead, it selectively addresses a small set of system components, or data blocks containing them, that are necessary for booting and system recovery, including the operating system kernel and the APRON kernel module, as shown in 2 Once these components are recovered, the system boots into a partially renovated state and can perform regular tasks, progressively recovering other invalid system components as needed.

Figure 2: System boot with APRON.
Figure 2: System boot with APRON.

This design allows APRON to significantly decrease downtime during system recovery by up to 28 times, compared with a normal system recovery, when retrieving portions of the reference image from a remote storage server connected through a 1 Gbps link. In addition, APRON incorporates a background thread dedicated to renovating the remaining invalid system components that might be accessed in the future. This background thread operates with low priority to avoid disrupting important foreground tasks. Throughout both renovation activities, APRON incurs an average runtime overhead of only 9% across a range of real-world applications. Once the renovation process is complete, runtime overhead disappears. 

APRON’s differentiator lies in its unique approach: the APRON kernel module acts as an intermediary between application or kernel threads and the system storage device, allowing it to verify and recover each data block on demand, as shown in Figure 3. When a block is requested, APRON follows a straightforward process. If the requested block is valid, APRON promptly delivers it to the requester. If it is found to be invalid, APRON employs a reference image to fix the block before serving it to the requester.

Figure 3: System storage renovation with APRON.
Figure 3: System storage renovation with APRON.

To efficiently and securely verify arbitrary data blocks, APRON uses a Merkle hash tree, which cryptographically summarizes every data block of the reference image. APRON further cryptographically authenticates the Merkle tree’s root hash value so that a malicious actor cannot tamper with it. To further improve performance, APRON treats zero blocks (data blocks filled with zeros) as a special case and performs deduplication to avoid repeatedly retrieving equivalent blocks. We discuss the technical details of this process in our paper.

Looking forward—extending APRON to container engines and hypervisors

APRON’s simple and widely applicable core design can easily apply to other use cases requiring efficient and secure image recovery or provisioning. We are currently exploring the possibility of implementing APRON within a container engine or hypervisor to realize an agentless APRON for container layers or virtual disk images. By extending APRON’s capabilities to these environments, we aim to provide an efficient and reliable image recovery and provisioning process without needing to modify container instances or add a guest operating system.

The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.

Read More

Research Focus: Week of July 3, 2023

Research Focus: Week of July 3, 2023

Microsoft Research Focus 19 | Week of July 3, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers

The era of hybrid work has created new challenges and opportunities for developers. Their ability to choose where they work and the scheduling flexibility that comes with remote work can be offset by the loss of social interaction, reduced collaboration efficiency and difficulty separating work time from personal time. Companies must be equipped to maintain a successful and efficient hybrid workforce by accentuating the positive elements of hybrid work, while also addressing the challenges.

In a new study: The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers, researchers from Microsoft aim to identify which form of work – whether fully in office, fully at home, or blended – yields the highest productivity and job satisfaction among developers. They analyzed over 3,400 survey responses conducted across 28 companies in seven countries, in partnership with Vista Equity Partners, a leading global asset manager with experience investing in software, data, and technology-enabled organizations.

The study found that developers face many of the same challenges found in other types of hybrid workplaces. The researchers provide recommendations for addressing these challenges and unlocking more productivity while improving employee satisfaction.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

NEW INSIGHTS

Prompt Engineering: Improving our ability to communicate with LLMs

Pretrained natural language generation (NLG) models are powerful, but in the absence of contextual information, responses are necessarily generic. The prompt is the primary mechanism for access to NLG capabilities. It is an enormously effective and flexible tool, yet in order to be actively converted to the expected output, a prompt must meet expectations for how information is conveyed. If the prompt is not accurate and precise, the model is left guessing. Prompt engineering aims to bring more context and specificity to generative AI models, providing enough information in the model instructions that the user gets the exact result they want.

In a recent blog post: Prompt Engineering: Improving our Ability to Communicate with an LLM, researchers from Microsoft explain how they use retrieval augmented generation (RAG) to do knowledge grounding, use advanced prompt engineering to properly set context in the input to guide large language models (LLMs), implement a provenance check for responsible AI, and help users deploy scalable NLG service more safely, effectively, and efficiently.


NEW RESOURCE

Overwatch: Learning patterns in code edit sequences

Integrated development environments (IDEs) provide tool support to automate many source code editing tasks. IDEs typically use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer’s next edit, and thus IDEs generate many suggestions at a location. Therefore, IDEs generally do not actively offer suggestions. The developer must click on a specific icon or menu and then select from a large list of potential suggestions. As a consequence, developers often miss the opportunity to use the tool support because they are not aware it exists or forget to use it. To better understand common patterns in developer behavior and produce better edit recommendations, tool builders can use the temporal context, i.e., the edits that a developer was recently performing.

To enable edit recommendations based on temporal context, researchers from Microsoft created Overwatch, a novel technique for learning edit sequence patterns from traces of developers’ edits performed in an IDE. Their experiments show that Overwatch has 78% precision and that it not only completed edits when developers missed the opportunity to use the IDE tool support, but also predicted new edits that have no tool support in the IDE.


UPDATED RESOURCE

Qlib updates harness adaptive market dynamics modeling and reinforcement learning to address key challenges in financial markets

Qlib is an open-source framework built by Microsoft Research that empowers research into AI technologies applicable to the financial industry. Qlib initially supported diverse machine learning modeling paradigms, including supervised learning. Now, a series of recent updates have added support for market dynamics modeling and reinforcement learning, enabling researchers and engineers to tap into more sophisticated learning methods for advanced trading system construction.

These updates broaden Qlib’s capabilities and its value proposition for researchers and engineers, empowering them to explore ideas and implement effective quantitative trading strategies. The updates, available on GitHub, make Qlib the first platform to offer diverse learning paradigms aimed at helping researchers and engineers solve key financial market challenges.

A significant update is the introduction of adaptive concept drift technology for modeling the dynamic nature of financial markets. This feature can help researchers and engineers invent and implement algorithms that can adapt to changes in market trends and behavior over time, which is crucial for maintaining a competitive advantage in trading strategies.

Qlib’s support for reinforcement learning enables a new feature designed to model continuous investment decisions. This feature assists researchers and engineers in optimizing their trading strategies by learning from interactions with the environment to maximize some notion of cumulative reward.

Related research:

DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation

Universal Trading for Order Execution with Oracle Policy Distillation

The post Research Focus: Week of July 3, 2023 appeared first on Microsoft Research.

Read More

Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems

Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems

Distributional Graphormer (DiG) animated logo

Structure prediction is a fundamental problem in molecular science because the structure of a molecule determines its properties and functions. In recent years, deep learning methods have made remarkable progress and impact on predicting molecular structures, especially for protein molecules. Deep learning methods, such as AlphaFold and RoseTTAFold, have achieved unprecedented accuracy in predicting the most probable structures for proteins from their amino acid sequences and have been hailed as a game changer in molecular science. However, this method provides only a single snapshot of a protein structure, and structure prediction cannot tell the complete story of how a molecule works.

Proteins are not rigid objects; they are dynamic molecules that can adopt different structures with specific probabilities at equilibrium. Identifying these structures and their probabilities is essential in understanding protein properties and functions, how they interact with other proteins, and the statistical mechanics and thermodynamics of molecular systems. Traditional methods for obtaining these equilibrium distributions, such as molecular dynamics simulations or Monte Carlo sampling (which uses repeated random sampling from a distribution to achieve numerical statistical results), are often computationally expensive and may even become intractable for complex molecules. Therefore, there is a pressing need for novel computational approaches that can accurately and efficiently predict the equilibrium distributions of molecular structures from basic descriptors.

A schematic diagram illustrating the goal of Distributional Graphormer (DiG). A molecular system is represented by a basic descriptor D, such as the amino acid sequence for a protein. DiG transforms D into a structural ensemble S, which consists of multiple possible conformations and their probabilities. S is expected to follow the equilibrium distribution of the molecular system. A legend shows a example of D and S for Adenylate kinase protein.
Figure 1. The goal of Distributional Graphormer (DiG). DiG takes the basic descriptor, D, of a molecular system, such as the amino acid sequence for a protein, as input to predict the structures and their probabilities following equilibrium distribution.

In this blog post, we introduce Distributional Graphormer (DiG), a new deep learning framework for predicting protein structures according to their equilibrium distribution. It aims to address this fundamental challenge and open new opportunities for molecular science. DiG is a significant advancement from single structure prediction to structure ensemble modeling with equilibrium distributions. Its distribution prediction capability bridges the gap between the microscopic structures and the macroscopic properties of molecular systems, which are governed by statistical mechanics and thermodynamics. Nevertheless, this is a tremendous challenge, as it requires modeling complex distributions in high-dimensional space to capture the probabilities of different molecular states.

DiG achieves a novel solution for distribution prediction through an advancement of our previous work, Graphormer, which is a general-purpose graph transformer that can effectively model molecular structures. Graphormer has shown excellent performance in molecular science research, demonstrated by applications in quantum chemistry and molecular dynamics simulations, as reported in our previous blog posts (see here and here for more details). Now, we have advanced Graphormer to create DiG, which has a new and powerful capability: using deep neural networks to directly predict target distribution from basic descriptors of molecules.

SPOTLIGHT: AI focus area

AI and Microsoft Research

Learn more about the breadth of AI research at Microsoft

DiG tackles this challenging problem. It is based on the idea of simulated annealing, a classic method in thermodynamics and optimization, which has also motivated the recent development of diffusion models that achieved remarkable breakthroughs in AI-generated content (AIGC). Simulated annealing produces a complex distribution by gradually refining a simple distribution through the simulation of an annealing process, allowing it to explore and settle in the most probable states. DiG mimics this process in a deep learning framework for molecular systems. AIGC models are often based on the idea of diffusion models, which are inspired by statistical mechanics and thermodynamics.

DiG is also based on the idea of diffusion models, but we bring this idea back to thermodynamics research, creating a closed loop of inspiration and innovation. We imagine scientists someday will be able to use DiG like an AIGC model for drawing, inputting a simple description, such as an amino acid sequence, and then using DiG to quickly generate realistic and diverse protein structures that follow equilibrium distribution. This will greatly enhance scientists’ productivity and creativity, enabling novel discoveries and applications in fields such as drug design, materials science, and catalysis.

How does DiG work?

A schematic diagram illustrating the design and backbone architecture of DiG. The diagram shows a molecular system with two possible conformations as an example. The top row shows the energy function of the molecular system as a curve, with two local minima corresponding to the two conformations. The bottom row shows the probability distribution of the molecular system as a bar chart, with two peaks corresponding to the two conformations. The diagram also shows a diffusion process that transforms the probability distribution from a simple uniform one to the equilibrium one that matches the energy function. The diffusion process consists of several intermediate time steps, labeled as i=0,1,…,T. At each time step, a deep-learning model, Graphormer, is used to construct a forward diffusion step that converts the distribution at the previous time step to the next one, indicated by blue arrows. The Graphormer model is learned to match the distribution at each time step to a predefined backward diffusion step that converts the equilibrium distribution to the simple one, indicated by orange arrows. The backward diffusion step is computed by adding Gaussian noise to the equilibrium distribution and normalizing it. The learning of the Graphormer model is supervised by both the samples and the energy function of the molecular system. The samples are obtained from a large-scale molecular simulation dataset that provides the initial samples and the corresponding energy labels. The energy function is used to calculate the energy scores for the generated samples and guide the diffusion process towards the equilibrium distribution. The diagram also shows a physics-informed diffusion pre-training (PIDP) method that is developed to pre-train DiG with only energy functions as inputs, without the data dependency. The PIDP method uses a contrastive loss function to minimize the distance between the energy scores and the probabilities of the generated samples at each time step. The PIDP method can enhance the generalization of DiG to molecular systems that are not in the dataset.
Figure 2. DiG’s design and backbone architecture.

DiG is based on the idea of diffusion by transforming a simple distribution to a complex distribution using Graphormer. The simple distribution can be a standard Gaussian, and the complex distribution can be the equilibrium distribution of molecular structures. The transformation is done step-by-step, where the whole process mimics the simulated annealing process.

DiG can be trained using different types of data or information. For example, DiG can use energy functions of molecular systems to guide transformation, and it can also use simulated structure data, such as molecular dynamics trajectories, to learn the distribution. More concretely, DiG can use energy functions of molecular systems to guide transformation by minimizing the discrepancy between the energy-based probabilities and the probabilities predicted by DiG. This approach can leverage the prior knowledge of the system and train DiG without stringent dependency on data. Alternatively, DiG can also use simulation data, such as molecular dynamics trajectories, to learn the distribution by maximizing the likelihood of the data under the DiG model.

DiG shows similarly good generalizing abilities on many molecular systems compared with deep learning-based structure prediction methods. This is because DiG inherits the advantages of advanced deep-learning architectures like Graphormer and applies them to the new and challenging task of distribution prediction.  Once trained, DiG can generate molecular structures by reversing the transformation process, starting from a simple distribution and applying neural networks in reverse order. DiG can also provide the probability estimation for each generated structure by computing the change of probability along the transformation process. DiG is a flexible and general framework that can handle different types of molecular systems and descriptors.

Results

We demonstrate DiG’s performance and potential through several molecular sampling tasks covering a broad range of molecular systems, such as proteins, protein-ligand complexes, and catalyst-adsorbate systems. Our results show that DiG not only generates realistic and diverse molecular structures with high efficiency and low computational costs, but it also provides estimations of state densities, which are crucial for computing macroscopic properties using statistical mechanics. Accordingly, DiG presents a significant advancement in statistically understanding microscopic molecules and predicting their macroscopic properties, creating many exciting research opportunities in molecular science.

One major application of DiG is to sample protein conformations, which are indispensable to understanding their properties and functions. Proteins are dynamic molecules that can adopt diverse structures with different probabilities at equilibrium, and these structures are often related to their biological functions and interactions with other molecules. However, predicting the equilibrium distribution of protein conformations is a long-standing and challenging problem due to the complex and high-dimensional energy landscape that governs probability distribution in the conformation space. In contrast to expensive and inefficient molecular dynamics simulations or Monte Carlo sampling methods, DiG generates diverse and functionally relevant protein structures from amino acid sequences at a high speed and a significantly reduced cost.

Figure 3. This illustration shows DiG’s performance when generating multiple conformations of proteins. On the left, DiG-generated structures of the main protease of SARS-CoV-2 virus are projected into 2D space panned with two TICA coordinates. On the right, structures generated by DiG (thin ribbons) are compared with experimentally determined structures (cylindrical figures) in each case.

DiG can generate multiple conformations from the same protein sequence. The left side of Figure 3 shows DiG-generated structures of the main protease of SARS-CoV-2 virus compared with MD simulations and AlphaFold prediction results. The contours (shown as lines) in the 2D space reveal three clusters sampled by extensive MD simulations. DiG generates highly similar structures in clusters II and III, while structures in cluster I are undersampled. In the right panel, DiG-generated structures are aligned to experimental structures for four proteins, each with two distinguishable conformations corresponding to unique functional states. In the upper left, the Adenylate kinase protein has open and closed states, both well sampled by DiG. Similarly, for the drug transport protein LmrP, DiG also generates structures for both states. Here, note that the closed state is experimentally determined (in the lower-right corner, with PDB ID 6t1z), while the other is the AlphaFold predicted model that is consistent with experimental data. In the case of human B-Raf kinase, the major structural difference is localized in the A-loop region and a nearby helix, which are well captured by DiG. The D-ribose binding protein has two separated domains, which can be packed into two distinct conformations. DiG perfectly generated the straight-up conformation, but it is less accurate in predicting the twisted conformation. Nonetheless, besides the straight-up conformation, DiG generated some conformations that appear to be intermediate states.

Another application of DiG is to sample catalyst-adsorbate systems, which are central to heterogeneous catalysis. Identifying active adsorption sites and stable adsorbate configurations is crucial for understanding and designing catalysts, but it is also quite challenging due to the complex surface-molecular interactions. Traditional methods, such as density functional theory (DFT) calculations and molecular dynamics simulations, are time-consuming and costly, especially for large and complex surfaces. DiG predicts adsorption sites and configurations, as well as their probabilities, from the substrate and adsorbate descriptors. DiG can handle various types of adsorbates, such as single atoms or molecules being adsorbed onto different types of substrates, such as metals or alloys.

Figure 4. Adsorption prediction results of single C, H, and O atoms on catalyst surfaces. The predicted probability distribution on catalyst surface is compared to the interaction energy between the adsorbate molecules and the catalyst in the middle and bottom rows.
Figure 4. Adsorption prediction results of single C, H, and O atoms on catalyst surfaces. The predicted probability distribution on catalyst surface is compared to the interaction energy between the adsorbate molecules and the catalyst in the middle and bottom rows.

Applying DiG, we predicted the adsorption sites for a variety of catalyst-adsorbate systems and compared these predicted probabilities with energies obtained from DFT calculations. We found that DiG could find all the stable adsorption sites and generate adsorbate configurations that are similar to the DFT results with high efficiency and at a low cost. DiG estimates the probabilities of different adsorption configurations, in good agreement with DFT energies.

Conclusion

In this blog, we introduced DiG, a deep learning framework that aims to predict the distribution of molecular structures. DiG is a significant advancement from single structure prediction toward ensemble modeling with equilibrium distributions, setting a cornerstone for connecting microscopic structures to macroscopic properties under deep learning frameworks.

DiG involves key ML innovations that lead to expressive generative models, which have been shown to have the capacity to sample multimodal distribution within a given class of molecules. We have demonstrated the flexibility of this approach on different classes of molecules (including proteins, etc.), and we have shown that individual structures generated in this way are chemically realistic. Consequently, DiG enables the development of ML systems that can sample equilibrium distributions of molecules given appropriate training data.

However, we acknowledge that considerably more research is needed to obtain efficient and reliable predictions of equilibrium distributions for arbitrary molecules. We hope that DiG inspires additional research and innovation in this direction, and we look forward to more exciting results and impact from DiG and other related methods in the future.

The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.

Read More

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

black and white photos of Dr. Spencer Fowers, a member of the Special Projects Technical Staff at Microsoft Research and Dr. Kwame Darko, a plastic surgeon in the reconstructive plastic surgery and burns center in Ghana’s Korle Bu Teaching Hospital, next to the Microsoft Research Podcast

Episode 142 | July 6, 2023 

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, host Dr. Gretchen Huizinga welcomes Dr. Spencer Fowers, a member of the Special Projects Technical Staff at Microsoft Research, and Dr. Kwame Darko, a plastic surgeon in the reconstructive plastic surgery and burns center in Ghana’s Korle Bu Teaching Hospital. The two are part of an intercontinental research project to study Holoportation, a Microsoft 3D capture and communication technology, in the medical setting. The goal of the study—led by Korle Bu, NHS Scotland West of Scotland Innovation Hub, and Canniesburn Plastic Surgery and Burns Unit—is to make specialized healthcare more widely available, especially to those in remote or underserved communities. Fowers and Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

SPENCER FOWERS: I work with a team that does moonshots for a living, so I’m always looking for, what can we shoot for? And our goal really is like, gosh, where can’t we apply this technology? I mean, just anywhere that it is at all difficult to get, you know, medical expertise, we can ease the burden of doctors by making it so they don’t have to travel to provide this specialized care and increase the access to healthcare to these people that normally wouldn’t be able to get access to it.

KWAME DARKO: So yeah, the scope is as far as the mind can imagine it.

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]


On this episode, I’m talking to Dr. Spencer Fowers, a Principal Member of the Technical Staff at Microsoft Research, and Dr. Kwame Darko, a plastic surgeon at the National Reconstructive Plastic Surgery and Burns Centre at the Korle Bu Teaching Hospital in Accra, Ghana. Spencer and Kwame are working on 3D telemedicine, a project they hope will increase access to specialized healthcare in rural and underserved communities by using live 3D communication, or Holoportation. We’ll learn much more about that in this episode. But first, let’s meet our collaborators.

Spencer, I’ll start with you. Tell us about the technical staff in the Special Projects division of Microsoft Research. What kind of work do you do, what’s the research model, and what’s your particular role there?

SPENCER FOWERS: Hi, Gretchen. Thanks for having me on here. Yeah, um, so our group at Special Projects was kind of patterned after the Lockheed Martin Skunk Works methodology. You know, we are very much a sort of “try big moonshot projects” type group. Our goal is sort of to focus on any sort of pie-in-the-sky idea that has some sort of a business application. So you can imagine we’ve done things like build the world’s first underwater datacenter or do post-quantum cryptography, things like that. Anything that, uh, is a very ambitious project that we can try to iterate on and see what type of an application we can find for it in the real world. And I’m one of the, as you said, a principal member of the technical staff. That means I’m one of the primary researchers, so I wear a lot of different hats. My job is everything from, you know, managing the project, meeting with people like Kwame and the other surgeons that we’ve worked with, and then interfacing there and finding ways that we can take theoretical research and turn it into applied research, actually find a way that we can bring that theory into reality.

HUIZINGA: You know, that’s a really interesting characterization because normally you think of those things in two different buckets, right? The big moonshot research has got a horizon, a time horizon, that’s pretty far out, and the applied research is get it going so it’s productizable or monetizable fairly quickly, and you’re marrying those two kinds of research models?

FOWERS: Yeah. I mean, we fit kind of a really interesting niche here at Microsoft because we get to operate sort of like a startup, but we have the backing of a very large company, so we get to sort of, yeah, take on these moonshot projects that a, a smaller company might not be able to handle and really attack it with the full resources of, of a company like Microsoft.

HUIZINGA: So it’s a moonshot project, but hurry up, let’s get ’er done. [LAUGHS]

FOWERS: Right, yeah.

HUIZINGA: Well, listen, Kwame, you’re a plastic surgeon at Korle Bu in Ghana. In many circles, that term is associated with nonessential cosmetic surgery, but that’s not what we’re talking about here. What constitutes the bulk of your work, and what prompted you to pursue it in the first place?

KWAME DARKO: All right, thanks, Gretchen, and also thank you for having me on the show. So, um, just as you said, my name is Kwame Darko, and I am a plastic surgeon. I think a lot of my passion to become a plastic surgeon came from the fact that at the time, we didn’t have too many plastic surgeons in Ghana. I mean, at the time that I qualified as a plastic surgeon, I was the eighth person in the country. And at the time, there was a population of 20-something million. Currently, we’re around 33 million, 34 million, and we have … we’re still not up to 30 plastic surgeons. So there was quite a bit of work to be done, and my work scopes from all the way to what everybody tends to [associate] plastic surgery with, the cosmetic stuff, across to burns, across to trauma from people with serious accidents that need some parts of their body reconstructed, to tumors of all sorts. Um, one of my fortes is breast cancer and breast reconstruction, but not limiting to that. We also [do] tumors of the leg. And we also help other surgeons to cover up spaces or defects that may have been created when they’ve taken off some sort of cancer or tumor or whatever it may be. So it’s a wide scope, as well as burn surgery and burn care, as well. So that’s the scope of the kind of work that I do.

HUIZINGA: You know, this wasn’t on my list to ask you, but I’m curious, um, both of you … Spencer, where did you get your training? What … what’s your background?

FOWERS: I actually got my PhD at university in computer engineering, uh, focused on computer vision. So a lot of my academic research was in, uh, you know, embedded systems and low-power systems and how we can get a vision-based stuff to work without using a lot of processing. And it actually fits really well for this application here where we’re trying to find low-cost ways that we can bring really high-end vision stuff, you know, and put it inside a hospital.

HUIZINGA: Yeah. So, Kwame, what about you? Where did you get your training, and did you start out in plastic surgery thinking, “Hey, that’s what I want to be”? Or did you start elsewhere and say, “This is cool”?

DARKO: So my, my background is that I did my medical school training here in Ghana at the medical school in Korle Bu and then started my postgraduate training in surgery. Um, over here, you need to do a number of years in just surgery before you can branch out and do a specific type of surgery. So, after my three, four years in that, I decided to do plastic surgery once again here in Ghana. You spend another up to three years—minimum of three years—training in that, which I did. And then you become a plastic surgeon. But then I went on for a bit of extra training and more exposure from different places around the world. I spent some time in Cape Town in South Africa working in a hospital called Groote Schuur. Um, I’ve also had the opportunity to work in, in Glasgow, where this idea originated from, and various courses in different parts of the world from India, the US, and stuff like that.

HUIZINGA: Wow. You know, I could spend a whole podcast asking you what you’ve seen in your lifetime in terms of trauma and burns and all of that. But I won’t, uh, because let’s talk about how this particular project came about, and I’d like both of your perspectives on it. This is a sort of “how I met your mother” story. As I understand it, there were a lot of people and more than two countries involved. Spencer, how do you remember the meet-up?

FOWERS: Yeah, I mean, Holoportation has been around since 2015, but it was around 2018 that, uh, Steven Lo—he’s a plastic surgeon in Glasgow—he approached us with this idea, saying, “Hey, we want to, we want to use this Holoportation technology in a hospital setting.” At that point, he was already working with Kwame and, and other doctors. They have a partnership between the Canniesburn Plastic Surgery unit in Glasgow and the Korle Bu Teaching Hospital in Accra. And he, he approached us with this idea of saying we want to build a clinic remotely so that people can come and see this. There is, like Kwame mentioned, right, a very drastic lack of surgeons in Ghana for the amount of the population, and so he wanted to find a way that he could provide reconstructive plastic surgery consultation to patients even though they’re very far away. Currently, the, you know, the Canniesburn unit, they do these trips every year, every couple of years, where they fly down to Ghana, perform surgeries. And the way it works is basically the surgeons get on an airplane, they fly down to Ghana, and then, you know, the next day they’re in the hospital, all day long, meeting these people that they’re going to operate on the next day, right? And trying to decide the day before the surgery what they’re going to operate on and what they’re going to do and get the consent from these patients. Is there a better way? Could we actually talk to these patients ahead of time? And 2D video calls just didn’t cut it. It wasn’t good enough, and Kwame can talk more about that. But his idea was, can we use something like this to make a 3D model of a patient, have a live conversation with them in 3D so that the surgeon can evaluate them before they go to Ghana and get an idea of what they’re going to do and be able to explain to the patient what they want to do before the surgery has to happen.

HUIZINGA: Yeah. So Microsoft Research, how did that group get involved?

FOWERS: Well, so we started with this technology back in, you know, 2015. And when he approached us with this idea, we were looking for ways that we could apply Holoportation to different areas, different markets. This came up as like one of those perfect fits for the technology, where we wanted to be able to use the system to image someone, it needed to be a live conversation, not a recording, and so that was, right there … was where we started working with them and designing the cameras that would go into the system they’re using today.

HUIZINGA: Right, right, right. Well, Kwame, in light of, uh, the increase, as we’ve just referred to, in 2D telemedicine, especially during COVID and, and post-COVID, people have gotten pretty used to talking to doctors over a screen as opposed to going in person. But there are drawbacks and shortcomings of 2D in your world. So how does 3D fill in those gaps, and, and what was attractive to you in this particular technology for the application you need?

DARKO: OK. So great, um, just as you’re saying, COVID really did spark the, uh, the spread of 2D telemedicine all over the world. But for myself, as a surgeon and particularly so as a plastic surgeon, we’re trying to think about, how is 2D video going to help me solve my problem or plan towards solving my problem for a patient? And you realize there is a significant shortfall when we’re not just dealing with the human being as a 2D object, uh, but 3D perspective is so important. So one of the most common things we’ve used this system to help us with is when we’re assessing a patient to decide which part of the body we’re going to move and use it to fit in the space that’s going to be created by taking out some form of tumor. And not only taking it out in 3D for us to know that it’s going to fit and be big enough but also demonstrating to the patient so they have a deeper understanding of exactly what is going to go and be used to reconstruct whichever part of their body and what defect is going to be left behind. So as against when you’re just having a straightforward consultation back and forth, answer and response, question and response, in this situation, we get the opportunity and have the ability to actually turn the patient around and then measure out specific problem … parts of the body that we’re going to take off and then transpose that on a different part of the body to make sure that it’s also going to be big enough to switch around and transpose. And when I’m saying transpose, I’m talking about maybe sticking something off from the front part of your thigh and then filling that in with maybe massive parts of your back muscle.

FOWERS: To add on to what Kwame said, you know, for us, for Microsoft Research, when Steven approached us with this, I don’t think we really understood the impact that it could have. You know, we even asked him, why don’t you just use like a cell phone, or why don’t you just use a 2D telemedicine call? Like, why do you need all this technology to do this? And he explained it to us, and we said, OK, like we’re going to take your word for it. It wasn’t until I went over there the first time that it really clicked for me and we had set up the system and he brought in a patient that had had reconstructive plastic surgery. She had had a cancerous tumor that required the amputation of her entire shoulder. So she lost her arm and, you know, this is not something that we think of on a day-to-day basis, but you actually, you can’t wear a shirt if you don’t have a shoulder. And so he was actually taking her elbow and replacing the joint that he was removing with her elbow joint. So he did this entire transpose operation. The stuff that they can do is amazing. But …

HUIZINGA: Right.

FOWERS: … he had done this operation on her probably a year before. And so he was bringing her back in for just the postoperative consult to see how she was doing. He had her in the system, and while she’s sitting in the system, he’s able to rotate the 3D model of her around so that she can see her own back. And he drew on her: “OK, this is where your elbow is now, and this is where we took the material from and what we did.” And during the teleconference, she says, “Oh, that’s what you did. I never knew what you did!” Like … she had had this operation a year ago, never knew what happened to herself because she couldn’t see her own back that way and couldn’t understand it. And it finally clicked to us like, oh my gosh, like, this is why this is important. Like not just because it aids the doctors in planning for surgeries, but the tremendous impact that it has on patient satisfaction with their operation and patient understanding of what’s going to happen.

HUIZINGA: Wow. That’s amazing. Even as you describe that, it’s … ahh … we could go so deep into the strangeness of what they can do with plastic surgery. But let’s talk about technology for a minute. Um, this is a highly visual technology, and we’re just doing a podcast, and we will provide some links in the show notes for people to see this in action, I hope. But in the meantime, Spencer, can you give us a kind of word picture of 3D telemedicine and the technology behind Holoportation? How does it work?

FOWERS: Yeah, the idea behind this technology is, if we can take pictures of a person from multiple angles and we know where those cameras are very, very accurately, we can stitch all those images together to make like a 3D picture of a person. So we’re actually using, for the 3D telemedicine system, we’re using the Azure Kinect. So it’s like Version 3 of the Kinect sensor that was introduced back in the Xbox days. And what that gives us is it gives us not just a color picture like you’re seeing on your normal 2D phone call, but it’s also giving us a depth picture so it can tell how far away you are from the camera. And we take that depth and that color information from 10 different cameras spaced around the room and stitch them all together in real time. So while we’re talking at, you know, normal conversation speed, it’s creating this 3D image of a person that the doctor, in this case, can actually rotate, pan, zoom in, and zoom out and be able to see them from any angle that they want without requiring that patient to get up and move around.

HUIZINGA: Wow. And that speaks to what you just said. The patient can see it as well as the clinician.

FOWERS: Yeah, I mean, you also have this problem with a lot of these patients if they’d had, you know, a leg amputation or something, when we typically talk like we’re talking now on like a, you know, the viewer, the listeners can’t see it, but a typical 2D telemedicine call, you’re looking at me from like my shoulders up. Well, if that person has an amputation of their knee, how do you get it so that you can talk to them in a normal conversation and then look at their knee? You, you just can’t do that on a 2D call. But this system allows them to talk to them and turn and look at their knee and show them—if it’s on their back, wherever it is—what they’re going to do and explain it to them.

HUIZINGA: That’s amazing. Kwame, this project doesn’t just address geographical challenges for remote patients. It also addresses geographical challenges for remote experts. So tell us about the nature and makeup of what you call MDTs­—or multidisciplinary teams—that you collaborate with and how 3D telemedicine impacts the care you’re able to provide because of that.

DARKO: All right. So with an MDT, or multidisciplinary team, just as you said, the focus on medicine these days is to take out individual bias in how we’re going to treat a particular patient, an individual knowledge base. So now what we tend to do is we try and get a group of doctors who would be treating a particular ailment—more often than not, it’s a cancer case—and everybody brings their view on what is best to holistically find a solution to the patient’s … the most ideal remedy for the patient. Now let’s take skin cancer, for example. You’re going to need a plastic surgeon if you’re going to cut it out. You’re going to need a dermatologist who is going to be able to manage it. If it’s that severe, you’re also going to need an oncologist. You may even need a radiologist and, of course, a psychologist and your nursing team, as well. So with an MDT, you’d ideally have members from each of these specialties in a room at a time discussing individual patients and deciding what’s best to do for them. What happens when I don’t have a particular specialty? And what happens when, even though I am the representative of my specialty on this group, I may not have as in-depth knowledge as is needed for this particular patient? What do we do? Do we have access to other brains around the world? Well, with this system, yes, we do. And just as we said earlier, that unlike where this is just a regular let’s say Teams meeting or whatever form of, uh, telemedicine meeting, in this one where we have the 3D edge, we can actually have the patient around in the rig. And as we’re discussing and talking about—and people are giving their ideas—we can swing the patient around and say, well, on this aspect, it would work because this is far away from the ear or closer to the ear, or no, the ear is going to have to go with this; it’s too close. So what do we do? Can we get somebody else to do an ear reconstruction in addition? If it’s, um, something on the back, if we’re taking it all out, is this going to involve the muscle, as well? If so, how are we going to replace the muscle? It’s beyond my scope. But oh! What do you know? We have an expert who’s done this kind of things from, let’s say, Korea or Singapore. And then they would log on and be able to see everything and give their input, as well. So this is another application which just crosses boundary … um, borders and gives us so much more scope to the application of this, uh, this new device.

HUIZINGA: So, so when we’re talking about multidisciplinary teams and, and we look at it from an expert point of view of having all these different disciplines in the room from the medical side, uh, Spencer this collaboration includes technologists, as well as medical professionals, but it also includes patients. You, you talk about what you call a participatory development validation. What is the role of patients in developing this technology?

FOWERS: Well, similar to like that story I was mentioning, right, as we started using this system, the initial goal was to give doctors this better ability to be able to see patients in preparation for surgery. What we found as we started to show this to patients was that it drastically increased their satisfaction from the visits with the doctors because they were able to better understand the operation that was going to be performed. It’s surprising how many times like Kwame and Steven will talk to me and they’ll tell us stories about how like they explain a procedure to a patient about what they’re going to do, and the patient says, “Yeah, OK.” And then they get done and the patient’s like, “Wait, what did you do? Like that doesn’t … I didn’t realize you were going to do that,” you know, because it’s hard for them to understand when you’re just talking about them or whether you’re drawing on a piece of paper. But when you actually have a picture of yourself in front of you that’s live and the doctors indicating on you what’s going to happen and what the surgery is going to be, it drastically increases the patient satisfaction. And so that was actually the direction of the randomized controlled trial that we’re conducting in, in Scotland right now is, what kind of improvement in patient satisfaction does this type of a system provide?

HUIZINGA: Hmm. It’s kind of speaking UX to me, like a patient experience as opposed to a user experience. Um, has it—any of this—fed into sort of feedback loop on technology development, or is it more just on the user side of how I feel about it?

FOWERS: Um, as far as like technology that we use for the system, when we started with Holoportation, we were actually using kind of research-grade cameras and building our own depth cameras and stuff like that, which made for a very expensive system that wasn’t easy to use. That’s why we transitioned over to the Azure Kinect because it’s actually like the highest-resolution depth camera you can get on the market today for this type of information. And so, it’s, it’s really pushed us to find, what can we use that’s more of a compact, you know, all-in-one system so that we can get the data that we need?

HUIZINGA: Right, right, right. Well, Kwame, at about this time, I always ask what could possibly go wrong? But when we talked before, you, you kind of came at this from a cup-half-full outlook because of the nature of what’s already wrong in digital healthcare in general, but particularly for rural and underserved communities, um, and you’ve kind of said what’s wrong is why we’re doing this. So what are some of the problems that are already in the mix, and how does 3D telemedicine mitigate any of them? Things like privacy and connectivity and bandwidth and cost and access and hacking, consent—all of those things that we’re sort of like concerned about writ large?

DARKO: All right. So when I was talking about the cup being half full in terms of these, all of these issues, it’s because these problems already exist. So this technology doesn’t present itself and create a new problem. It’s just going to piggyback off the solutions of what is already in existence. All right? So you, you mentioned most of them anyway. I mean, talking about patient privacy, which is No. 1. Um, all of these things are done on a hospital server. They are not done on a public or an ad hoc server of any sort. So whatever fail-safes there are within the hospital in itself, whichever hospital network we’re using, whether here in Ghana, whether in Glasgow, whether somewhere remotely in India or in the US, doesn’t matter where, it would be piggybacking off a hospital server. So those fail-safes are there already. So if anybody can get into the network and observe or steal data from our system, then it’s because the hospital system isn’t secure, not because it’s our system, in a manner of speaking, is not secure. All right? And then when I was saying that it’s half full, it’s because whatever lapses we have already in 2D telemedicine, this supersedes it. And not only does it supersede the 2D lapses, it goes again and gives significant patient feedback like we were saying earlier, what Spencer also alluded to, is that now you have the ability to show the patient exactly what’s going on. And so in previous aspects where, think about it, even if it’s an in-person consultation where I would draw on a piece of paper and explain to them, “Well, I’m going to do this, this, this, and that,” now I actually have the patient’s own body, which they’re watching at the same time, being spun around and indicating that this actually is the spot I was talking about and this is how big my cut is going to be, and this is what I’m going to move out from here and use to fill in this space. So once again, my inclination on this is that, on our side, we can only get good, as against to looking for problems. The problems, I, I admit, will exist, but not as a separate entity from regular 2D medicine that’s … or 2D videography that we’re already encountering.

HUIZINGA: So you’re not introducing new risks with this. You’re just sort of serving on the other risks.

DARKO: We’re adding to the positives, basically.

HUIZINGA: Right. Yeah, Spencer, in the “what could go wrong” bucket on the other side of it, I’m looking at healthcare and the cost of it, uh, especially when you’re dealing with multiple specialists and complicated surgeries and so on. And I know healthcare policy is not on your research roadmap necessarily, but you have to be thinking about that as you’re, um, as you’re going on how this will ultimately be implemented across cultures and so on. So have you given any thought to how this might play out in different countries, or is this just sort of “we’re going to make the technology and let the policy people and the wonks work it out later?”

FOWERS: [LAUGHS] Yeah, it’s a good question, and I think it’s something that we’re really excited to see how it can benefit. Luckily enough, where we’re doing the tests right now, like in, uh, Glasgow and in Ghana, they already have partnerships and so there’s already standards in place for being able to share doctors and technology across that. But yeah, we’ve definitely looked into like, what kind of an impact does this have? And one of the benefits that we see is using something like 3D telemedicine even to provide greater access for specialty doctors in places like rural or remote United States, where they just don’t have access to those specialists that they need. I mean, you know, Washington state, where I am, has a great example where you’ve got people that live out in Eastern Washington, and if they have some need to go see like a pediatric specialist, they’re going to have to drive all the way into Seattle to go to Seattle Children’s to see that person. What if we can provide a clinic that allows them to, you know, virtually, through 3D telemedicine, interface with that doctor without having to make that drive and all that commute until they know what they need to do. And so we actually look at it as being beneficial because this provides greater access to these specialists, to other regions. So it’s actually improving, improving healthcare reach and accessibility for everyone.

HUIZINGA: Yeah. Kwame, can you speak to accessibility of these experts? I mean, you would want them all on your team for a 3D telemedicine call, but how hard is it to get them all on the same … I mean, it’s hard to get people to come to a meeting, let alone, you know, a big consultation. Does that enter the picture at all or, is that … ?

DARKO: It does. It does. And I think, um, COVID is something, is something else that’s really changed how we do everyday, routine stuff. So for example, we here in Ghana have a weekly departmental meeting and, um—within the plastic surgery department and also within the greater department of surgery, weekly meeting—everything became remote. So all of a sudden, people who may not be able to make the meeting for whatever reason are now logging on. So it’s actually made accessibility to them much, much easier and swifter. I mean, where they are, what they’re doing at the time, we have no idea, but it just means that now we have access to them. So extrapolating this on to us getting in touch with specialists, um, if we schedule our timing right, it actually makes it easier for the specialists to log on. Now earlier we spoke about international MDTs, not just local, but then, have we thought about what would have happened if we did not have this ability to have this online international MDT? We’re talking about somebody getting a plane ticket, sitting on a plane, waiting in airports, airport delays, etcetera, etcetera, and flying over just to see the patient for 30 minutes and make a decision that, “Well, I can or cannot do this operation.” So now this jumps over all of this and makes it much, much easier for us. And now when we move on to the next stage of consultation, after the procedure has been done, when I’m talking about the surgery, now the patient doesn’t need to travel great distances for individual specialist review. Now in the case of plastic surgery, this may cover not only the surgeon but also the physiotherapist. And so, it’s not just before the consultation but also after the consultation.

HUIZINGA: Wow. Spencer, what you’re doing with 3D telemedicine through Holoportation is a prime example of how a technology developed for one thing turned out to have incredible uses for another. So give us just a brief history of the application for 3D communication and how it evolved from where it started to where it is now.

FOWERS: Yeah, I mean, 3D communication, at least from what we’re doing, really started with things like the original Xbox Kinect, right? With a gaming console and a way to kind of interact in a different way with your gaming system. What happened was, Microsoft released that initial Kinect and suddenly found that people weren’t buying the Kinect to play games with it. They were buying to put it on robots and buying to use, you know, for different kind of robotics applications and research applications. And that’s why the second Kinect, when it was released, they had an Xbox version and they actually had a Kinect for Windows version because they were expecting people to buy this sensor to plug it in their computers. And if you look at the form factor now with the Azure Kinect that we have, it’s a much more compact unit. It’s meant specifically for using on computers, and it’s built for robotics and computer vision applications, and so it’s been really neat to see how this thing that was developed as kind of a toy has become something that we now use in industrial applications.

HUIZINGA: Right. Yeah. And this … sort of the thing, the serendipitous nature of research, especially with, you know, big moonshot projects, is like this is going to be great for gaming and it actually turns out to be great for plastic surgery! Who’d have thunk? Um, Kwame, speaking to where this lives in terms of use, where is it now on the spectrum from lab to life, as I like to say?

DARKO: So currently, um, we have a rig in our unit, the plastic surgery unit in the Korle Bu Teaching Hospital. There’s a rig in Glasgow, and there’s a rig over in the Microsoft office. So currently what we’ve been able to do is to run a few tests between Ghana, Seattle, and Glasgow. So basically, we’ve been able to run MDTs and we’ve been able to run patient assessments, pre-op assessments, as well as post-operative assessments, as well. So that’s where we are at the moment. It takes quite a bit of logistics to initiate, but we believe once we’re on a steady roll, we’ll be able to increase our numbers that we’ve been able to do this on. I think currently those we’ve operated and had a pre-op assessment and post-op assessment have been about six or seven patients. And it was great, basically. We’ve done MDTs across with them, as well. So the full spectrum of use has been done: pre-op, MDT, and post-op assessments. So yeah, um, we have quite a bit more to do with numbers and to take out a few glitches, especially with remote access and stuff like that. But yeah, I think we’re, we’re, we’re making good progress.

HUIZINGA: Yeah. Spencer, do you see, or do you know of, hurdles that you’re going to have to jump through to get this into wider application?

FOWERS: For us, from a research setting, one of the things that we’ve been very clear about as we do this is that, while it’s being used in a medical setting, 3D telemedicine is actually just a communication technology, right? It’s a Teams call; it’s a communication device. We’re not actually performing surgery with the system, you know, or it’s not diagnosing or anything. So it’s not actually a medical device as much as it’s a telecommunication device that’s being used in a medical application.

HUIZINGA: Well, as we wrap up, I like to give each of you a chance to paint a picture of your preferred future. If your work is wildly successful, what does healthcare look like in five to 10 years? And maybe that isn’t the time horizon. It could be two to three; it could be 20 years. I don’t know. But how have you made a difference in specialized medicine with this communication tool?

FOWERS: Like going off of what Kwame was saying, right, back in November, when we flew down and were present for that first international MDT, it was really an eye-opening experience. I mean, these doctors, normally, they just get on an airplane, they fly down, and they meet these patients for the first time, probably the day before they’ve had surgery. And this time, they were able to meet them and then be able to spend time before they flew down preparing for the surgery. And then they did the surgeries. They flew back. And normally, they would fly back, they wouldn’t see that patient again. With 3D telemedicine, they jumped back on a phone call and there was the person in 3D, and they were able to talk to them, you know, turn them around, show them where the procedure was, ask them questions, and have this interaction that made it so much better of an experience for them and for the doctors involved. So when I look at kind of the future of where this goes, you know, our vision is, where else do we need this? Right now, it’s been showcased as this amazing way to bring international expertise to one operating theater, you know, with specialists from around the world, as needed. And I think that’s great. And I think we can apply that in so many different locations, right? Rural United States is a great example for us. We hope to expand out what we’re doing in Scotland, to rural areas of Scotland that, you know, it’s very hard for people in the Scottish isles to be able to get to their hospitals. You know, other possible applications … like can we make this system mobile? You can imagine like a clinical unit where this system drives out to remote villages and is able to allow people that can’t make it in to a hospital to get that initial consultation, to know whether they should make the trip or whether they need other work done before they can start surgery. So kind of the sky’s the limit, right? I mean, it’s always good to look at like, what’s … I work with a team that does moonshots for a living, so I’m always looking for what can we shoot for? And our goal really is like, gosh, where can’t we apply this technology? I mean, it just anywhere that it is at all difficult to get, you know, medical expertise, we can ease the burden of doctors by making it so they don’t have to travel to provide this specialized care and increase the access to healthcare to these people that normally wouldn’t be able to get access to it.

HUIZINGA: Kwame, what’s your, what’s your take?

DARKO: So to me, I just want to describe what the current situation is and what I believe the future situation will be. So, the current situation—and like, like, um, Spencer was saying, this just doesn’t apply to Ghana alone; it can apply in some parts of the US and some parts of the UK, as well—where a patient has a problem, is seen by a GP in the local area, has to travel close to 24 hours, sometimes sleep over somewhere, just to get access to a specialist to see what’s going on. The specialist now diagnoses, sees what’s happening, and then runs a barrage of tests and makes a decision, “Well, you’re going to have an operation, and the operation is going to be in, let’s say, four weeks, six weeks.” So what happens? The patient goes, spends another 24 hours-plus going all the way back home, waiting for the operation day or operation period, and then traveling all the way back. You can imagine the time and expense. And if this person can’t travel alone, that means somebody else needs to take a day off work to bring the person back and forth. So now what would happen in the future if everything goes the way we’re planning? We’d have a rig in every, let’s say, district or region. The person just needs to travel, assumedly, an hour or two to the rig. Gets the appointment. Everything is seen in 3D. All the local blood tests and stuff that can be done would be done locally, results sent across. Book a theater date. So the only time that the person really needs to travel is when they’re coming for the actual operation. And once again, if an MDT has to be run on this, on this patient, it will be done. And, um, they would be sitting in their rig remotely in the town or wherever it is. Those of us in the teaching hospitals across the country would also be in our places, and we’d run the MDT to be sure. Postoperatively, if it’s a review of the patient, we’d be able to do that, even if it’s an MDT review, as well, we could do that. And the extra application, which I didn’t highlight too much and I mentioned it, but I didn’t highlight it, is if this person needs to have physiotherapy and we need to make sure that they’re succeeding and doing it properly, we can actually do it through a 3D call and actually see the person walking in motion or wrist movement or hand extension or neck movements, whatever it is. We can do all this in 3D. So yeah, the, the scope is, is as far as the mind can imagine it!

HUIZINGA: You know, I’m even imagining it, and I hate to bring up The Jetsons as, you know, my, my anchor analogy, but, you know, at some point, way back, nobody thought they’d have the technology we have all in our rooms and on our bodies now. Maybe this is just like the beginning of everybody having 3D communication everywhere, and no one has to go to the doctor before they get the operation. [LAUGHS] I don’t know. Spencer Fowers, Kwame Darko. This is indeed a mind-blowing idea that has the potential to be a world-changing technology. Thanks for joining me today to talk about it.

DARKO: Thanks for having us, Gretchen.

FOWERS: Thanks.

The post Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko appeared first on Microsoft Research.

Read More

Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation

Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation

Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the way humans interact with computers on various tasks, including assistive technology, custom learning tools, ambient computing, and content generation.

In a recent paper: Any-to-Any Generation via Composable Diffusion, Microsoft Azure Cognitive Service Research and UNC NLP present CoDi, a novel generative model capable of processing and simultaneously generating content across multiple modalities. CoDi allows for the synergistic generation of high-quality and coherent outputs spanning various modalities, from assorted combinations of input modalities. CoDi is the latest work of Microsoft’s Project i-Code, which aims to develop integrative and composable multimodal AI. Through extensive experiments, the researchers demonstrate CoDi’s remarkable capabilities.

The challenge of multimodal generative AI

The powerful cross-modal models that have emerged in recent years are mostly capable of generating or processing just one single modality. These models often face limitations in real-world applications where multiple modalities coexist and interact. Chaining modality-specific generative models together in a multi-step generation setting can be cumbersome and slow.

Moreover, independently generated unimodal streams may not be consistent and aligned when stitched together in a post-processing way, such as synchronized video and audio.

To address these challenges, the researchers propose Composable Diffusion (CoDi), the first model capable of simultaneously processing and generating arbitrary combinations of modalities. CoDi employs a novel composable generation strategy that involves building a shared multimodal space by bridging alignment in the diffusion process, enabling the synchronized generation of intertwined modalities, such as temporally aligned video and audio.

Spotlight: Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.

The power of composable diffusion

A GIF of CoDi generation pipelines. The input modalities are listed vertically on the left side, including the text “Teddy bear on a skateboard, 4k”, a picture of Times Square, and the waveform of raining ambience. Input modalities are input to the CoDi model, depicted by a rectangular block, and output modalities are listed on the right side. Input modalities, CoDi, and output modalities connected by lines of different colors to represent different generation pipelines. The yellow line depicts that given an input of rain audio, CoDi can generate the text description “Raining, rain, moderate”. Depicted by red lines, CoDi can take in the image of Times Square together with the rain audio, to generate the audio of raining street. Finally, depicted by purple lines, the input modalities are the text “Teddy bear on a skateboard, 4k”, the picture of Times Square, and the raining audio; the output is a video with sound. In the video, a Teddy bear is skateboarding in the rain on the street of Times Square, and one can hear synchronized sound of skateboarding and rain.
Figure 1: CoDi can generate any combination of modalities from any mixture of input modalities.

Training a model to take any mixture of input modalities and flexibly generate any mixture of outputs presents significant computational and data requirements, as the number of combinations for the input and output modalities scales exponentially. And the scarcity of aligned training data for many groups of modalities makes it infeasible to train with all possible input-output combinations. To address these challenges, the researchers propose to build CoDi in a composable and integrative manner.

They start by training each individual modality-specific latent diffusion model (LDM) independently (these LDMs will be smoothly integrated later for joint generation). This approach ensures exceptional single-modality generation quality using widely available modality-specific training data. To allow CoDi to handle any mixture of inputs, input modalities like images, video, audio, and language are projected into the same semantic space. Consequently, the LDM of each modality can flexibly process any mixture of multimodal inputs. The multi-conditioning generation process is done by letting diffusers be conditioned on these inputs via a weighted sum of each input modality’s representation.

One of CoDi’s most significant innovations is its ability to handle many-to-many generation strategies, simultaneously generating any mixture of output modalities. To achieve this, CoDi adds a cross-attention module to each diffuser, and an environment encoder to project the latent variable of different LDMs into a shared latent space.

By freezing the parameters of the LDM and training only the cross-attention parameters and the environment encoder, CoDi can seamlessly generate any group of modalities without training on all possible generation modality combinations, reducing the training objectives to a more manageable number.

Showcasing CoDi’s capabilities

The research demonstrates the novel capacity of joint generation of multiple modalities, such as synchronized video and audio, given separate text, audio, and image prompts. Specifically, in the example shown below, the input text prompt is “teddy bear on a skateboard, 4k, high resolution”, the input image prompt is a picture of Times Square, and the input audio prompt is rain. The generated video, shown in Figure 2, is a teddy bear skateboarding in the rain at Times Square. The generated audio contains the sounds of rain, skateboarding, and street noise, which are synchronized with the video. This shows that CoDi can consolidate information from multiple input modalities and generate coherent and aligned outputs.

Figure 2: The video shows an example of CoDi generating video + audio from text, image and audio input. The input modalities are listed vertically on the left side, including the text “Teddy bear on a skateboard, 4k”, a picture of Times Square, and the waveform of raining ambience. The output is a video with sound. In the video, a Teddy bear is skateboarding in the rain on the street of Times Square. One can also hear synchronized sound of skateboarding and rain.

In addition to its strong joint-modality generation quality, CoDi is also capable of single-to-single modality generation and multi-conditioning generation. It outperforms or matches the unimodal state of the art for single-modality synthesis.

Potential real-world applications and looking forward

CoDi’s development unlocks numerous possibilities for real-world applications requiring multimodal integration. For example, in education, CoDi can generate dynamic, engaging materials catering to diverse learning styles, allowing learners to access information tailored to their preferences, while enhancing understanding and knowledge retention. CoDi can support some accessible experiences for people with disabilities, such as providing audio descriptions and visual cues for deaf or low-hearing individuals.

Composable Diffusion marks a significant step towards more engaging and holistic human-computer interactions, establishing a solid foundation for future investigations in generative artificial intelligence.

The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.

Read More

Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization 

Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization 

Analog Iterative Machine (AIM)

Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to process continuous value data, unlike today’s digital computers that use transistors to crunch through binary data. This innovative new machine has the potential to surpass state-of-the-art digital technology and transform computing in years to come.

The Analog Iterative Machine (AIM) is designed to solve difficult optimization problems, which form the foundation of many industries, such as finance, logistics, transportation, energy, healthcare, and manufacturing. However, traditional digital computers struggle to crack these problems in a timely, energy-efficient and cost-effective manner. This is because the number of possible combinations explodes exponentially as the problem size grows, making it a massive challenge for even the most powerful digital computers. The Traveling Salesman Problem is a classic example. Imagine trying to find the most efficient route for visiting a set of cities just once before returning to the starting point. With only five cities, there are 12 possible routes – but for a 61-city problem, the number of potential routes surpasses the number of atoms in the universe.

AIM addresses two simultaneous trends. First, it sidesteps the diminishing growth of computing capacity per dollar in digital chips – or the unraveling of Moore’s Law. Second, it overcomes the limitations of specialized machines designed for solving optimization problems. Despite over two decades of research and substantial industry investment, such unconventional hardware-based machines have a limited range of practical applications, because they can only address optimization problems with binary values. This painful realization within the optimization community has driven the team to develop AIM, with a design that combines mathematical insights with cutting-edge algorithmic and hardware advancements. The result? An analog optical computer that can solve a much wider range of real-world optimization problems while operating at the speed of light, offering potential speed and efficiency gains of about a hundred times.

Today, AIM is still a research project, but the cross-disciplinary team has recently assembled the world’s first opto-electronic hardware for mixed – continuous and binary – optimization problems. Though presently operating on a limited scale, the initial results are promising, and the team has started scaling up its efforts. This includes a research collaboration with the UK-based multinational bank Barclays to solve an optimization problem critical to the financial markets on the AIM computer. Separate engagements are aimed at gaining more experience in solving industry-specific optimization problems. In June 2023, the team launched an online service that provides an AIM simulator to allow partners to explore the opportunities created by this new kind of computer.

The technology 

Photons possess a remarkable property of not interacting with one another, which has underpinned the internet era by enabling large amounts of data to be transmitted over light across vast distances. However, photons do interact with the matter through which they propagate, allowing for linear operations such as addition and multiplication, which form the basis for optimization applications. For instance, when light falls on the camera sensor on our smartphones, it adds up the incoming photons and generates the equivalent amount of current. Additionally, data transmission over fiber which brings internet connectivity to homes and businesses relies on encoding zeroes and ones onto light by programmatically controlling its intensity. This scaling of light through light-matter interaction multiplies the light intensity by a specific value – multiplication in the optical domain. Beyond optical technologies for linear operations, various other electronic components prevalent in everyday technologies can perform non-linear operations that are also critical for efficient optimization algorithms.

Analog optical computing thus involves constructing a physical system using a combination of analog technologies – both optical and electronic – governed by equations that capture the required computation. This can be very efficient for specific application classes where linear and non-linear operations are dominant. In optimization problems, finding the optimal solution is akin to discovering a needle in an inconceivably vast haystack. The team has developed a new algorithm that is highly efficient at such needle-finding tasks. Crucially, the algorithm’s core operation involves performing hundreds of thousands or even millions of vector-matrix multiplications – the vectors represent the problem variables whose values need to be determined while the matrix encodes the problem itself. These multiplications are executed swiftly and with low energy consumption using commodity optical and electronic technologies, as shown in Figure 1.

Figure 1: Illustration of the AIM computer
Figure 1: Illustration of the AIM computer, which implements massively parallel vector-matrix multiplication using commodity optical technologies (in the back) and non-linearity applied using analog electronics (front). The vector is represented using an array of light sources, the matrix is embedded into the modulator array (shown in grayscale) and the result is collected into the camera sensor.
Figure 2: The second-generation AIM computer
Figure 2: The second-generation AIM computer, with 48 variables, is a rack-mounted appliance.

Thanks to the miniaturization of all these components onto tiny centimeter-scale chips, the entire AIM computer fits into a small rack enclosure – as shown in Figure 2. As light travels incredibly fast – 5 nanoseconds per meter – each iteration within the AIM computer is significantly faster and consumes less electricity than running the same algorithm on a digital computer. Importantly, since the entire problem is embedded into the modulator matrix inside the computer itself, AIM does not require the problem to be transferred back and forth between storage and compute locations. And unlike synchronous digital computers, AIM’s operation is entirely asynchronous. These architectural choices circumvent key historical bottlenecks for digital computers. 

Finally, all technologies used in AIM are already prevalent in consumer products with existing manufacturing ecosystems, which paves the way for a viable computing platform, at full scale, if all the technical challenges can be tamed by the team.

The importance of optimization problems

Optimization problems are mathematical challenges that require finding the best possible solution from a set of feasible alternatives. The modern world relies heavily on efficient solutions to these problems – from managing electricity in our power grids and streamlining goods delivery across sea, air, and land, to optimizing internet traffic routing.

Effectively and efficiently solving optimization problems can significantly improve processes and outcomes across many other industries. Take finance, for example, where portfolio optimization involves selecting the ideal combination of assets to maximize returns while minimizing risks. In healthcare, optimizing patient scheduling can enhance resource allocation and minimize waiting times in hospitals.

For many larger problems, even the world’s biggest supercomputer would take years or even centuries to find the optimal solution to such problems. A common workaround is heuristic algorithms – problem-solving techniques that provide approximate solutions by employing shortcuts or “rules of thumb.” Although these algorithms might not guarantee the discovery of an optimal solution, they are the most practical and efficient methods for finding near-optimal solutions in reasonable timeframes. Now, imagine the immense impact of a computer that could deliver more optimal solutions in a significantly shorter timeframe for these critical problems. In some instances, solving these problems in real-time could create a domino effect of positive outcomes, revolutionizing entire workflows and industries.

QUMO: a world beyond QUBO

For years, researchers, both in industry and academia, have built impressive specialized machines to efficiently solve optimization problems using heuristic algorithms. This includes an array of custom hardware, such as field programmable gate arrays (FPGAs), quantum annealers, and electrical and optical parametric oscillator systems. However, all of them rely on mapping difficult optimization problems to the same binary representation, often referred to as Ising, Max-Cut or QUBO (quadratic unconstrained binary optimization). Unfortunately, none of these efforts have provided a practical alternative to conventional computers. This is because it is very hard to map real-world optimization problems at scale to the binary abstraction, a common theme in the team’s engagement with practitioners across industry and academia.

With AIM, the team has introduced a more expressive mathematical abstraction called QUMO (quadratic unconstrained mixed optimization), which can represent mixed – binary and continuous – variables and is compatible with hardware implementation, making it the “sweetspot” for many practical, heavily-constrained optimization problems. Discussions with industry experts indicate that scaling AIM to 10,000 variables would mean that most of the practical problems discussed earlier are within reach. A problem with 10,000 variables that can be directly mapped to the QUMO abstraction would require an AIM computer with 10,000 physical variables. In contrast, existing specialized machines would need to scale to beyond a million physical variables, well beyond the capabilities of the underlying hardware.

AIM also implements a novel and efficient algorithm for solving such QUMO problems that relies on an advanced form of gradient descent, a technique that is also popular in machine learning. The algorithm shows highly competitive performance and accuracy across various industrially inspired problem benchmarks. It even discovered new best-ever solutions to four problems. The first-generation AIM computer, built last year, solves QUMO optimization problems that are represented with an accuracy of up to 7 bits. The team, shown in Figure 3, has also shown good quantitative agreement between the simulated and the hardware version of the AIM computer to gain further confidence in the viability of these efficiency gains as the computer is scaled up. This paper gives more details about the AIM architecture, its implementation, evaluation and scaling roadmap.

Photo of the AIM team – Front row (left to right): Doug Kelly, Jiaqi Chu, James Clegg, Babak Rahmani. Back row: Hitesh Ballani, George Mourgias-Alexandris, Daniel Cletheroe, Francesca Parmigiani, Lucinda Pickup, Grace Brennan, Ant Rowstron, Kirill Kalinin, Jonathan Westcott, Christos Gkantsidis. (Greg O'Shea and Jannes Gladrow do not appear in this photo.)
Figure 3: AIM’s design involves innovation at the intersection of optical and analog hardware, mathematics and algorithms, and software and system architecture, which is typified in the cross-disciplinary nature of the team working hand-in-hand towards the mission of building a computer that solves practical problems. Photo of the AIM team – Front row (left to right): Doug Kelly, Jiaqi Chu, James Clegg, Babak Rahmani. Back row: Hitesh Ballani, George Mourgias-Alexandris, Daniel Cletheroe, Francesca Parmigiani, Lucinda Pickup, Grace Brennan, Ant Rowstron, Kirill Kalinin, Jonathan Westcott, Christos Gkantsidis. (Greg O’Shea and Jannes Gladrow do not appear in this photo.)

Rethinking optimization with QUMO: A more expressive way of reasoning for experts

AIM’s blueprint for co-designing unconventional hardware with an expressive abstraction and a new algorithm has the potential to spark a new era in optimization techniques, hardware platforms, and automated problem mapping procedures, utilizing the more expressive QUMO abstraction. This exciting journey has already begun, with promising results from mapping problems from diverse domains like finance and healthcare to AIM’s QUMO abstraction. Recent research has already shown that increased expressiveness with continuous variables can substantially expand the real-world business problems that can be tackled. However, to the team’s knowledge, AIM is the first and only hardware to natively support this abstraction.

As we venture into a new abstraction, we must also adopt new ways of thinking. It is crucial for the team to build a strong community to deeply investigate the benefits of embracing QUMO. We invite people who have previously been deterred by the limitations of binary solvers to consider the new opportunities offered by AIM’s QUMO abstraction. To facilitate this, we are releasing our AIM simulator as a service, allowing selected users to get first-hand experience. The initial users are the team’s collaborators at Princeton University and at Cambridge University. They have helped us identify several exciting problems where the AIM computer and its abstraction is a much more natural fit. We are also actively engaging with thought leaders from internal Microsoft divisions and external companies in sectors where optimization is crucial.

Together, we can drive innovation and unlock the true potential of analog optical computing for solving some of the most complex optimization problems across industries.

The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization  appeared first on Microsoft Research.

Read More

Research Focus: Week of June 19, 2023

Research Focus: Week of June 19, 2023

Microsoft Research Focus 18 | Week of June 19, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESOURCE

Responsible AI Maturity Model

As the use of AI continues to surge, new government regulations are expected. But the organizations that build and use AI technologies needn’t wait to devise best practices for developing and deploying AI systems responsibly. Many companies have adopted responsible AI (RAI) principles as a form of self-regulation. Yet, effectively translating these principles into practice is challenging.

To help organizations identify their current and desired levels of RAI maturity, researchers at Microsoft have developed the Responsible AI Maturity Model (RAI MM). The RAI MM is a framework containing 24 empirically derived dimensions that are key to an organization’s RAI maturity, and a roadmap of maturity progression so organizations and teams can identify where they are and where they could go next.

Derived from interviews and focus groups with over 90 RAI specialists and AI practitioners, the RAI MM can help organizations and teams navigate their RAI journey, even as RAI continues to evolve.

Spotlight: Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.

NEW RESEARCH

FoundWright helps people re-find web content they previously discovered

Re-finding information is a common task—most online search requests involve re-finding information. However, this can be difficult when people struggle to express what they seek. People may forget exact details of the information they want to re-find, making it hard to craft a query to locate it. People may also struggle to recover information within web repositories, such as bookmarks or history, as these do not capture enough information, or present an experience to allow ambiguous queries. As a result, people can feel overwhelmed and cognitively exhausted when faced with a re-finding task.

A new paper from Microsoft researchers: FoundWright: A System to Help People Re-find Pages from Their Web-history, introduces a new system to address these problems. FoundWright leverages recent advances in language transformer models to expand people’s ability to express what they seek by defining concepts that can attract documents with semantically similar content. The researchers used FoundWright as a design probe to understand how people create and use concepts; how this expanded ability helps re-finding; and how people engage and collaborate with FoundWright’s machine learning support. The research reveals that this expanded way of expressing re-finding goals complements traditional searching and browsing.


NEW RESEARCH

Trace-Guided Inductive Synthesis of Recursive Functional Programs

In recent years, researchers have made significant advances in synthesis of recursive functional programs, including progress in inductive synthesis of recursive programs from input-output examples. The latter problem, however, continues to pose several challenges.

In a new paper: Trace-Guided Inductive Synthesis of Recursive Functional Programs, which received a distinguished paper award from the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2023), researchers from Microsoft and Purdue University propose a novel trace-guided approach to tackle the challenges of ambiguity and generalization in synthesis of recursive functional programs from examples. This approach augments the search space of programs with recursion traces consisting of sequences of recursive subcalls of programs. It is based on a new version space algebra (VSA) for succinct representation and efficient manipulation of pairs of recursion traces and programs that are consistent with each other. The researchers implement this approach in a tool called SyRup. Evaluating SyRup on benchmarks from prior work demonstrates that it not only requires fewer examples to achieve a certain success rate than existing synthesizers, but is also less sensitive to the quality of the examples.

These results indicate that utilizing recursion traces to differentiate satisfying programs with similar sizes is applicable to a wide range of tasks.


NEW RESEARCH

Wait-Free Weak Reference Counting

Reference counting is a common approach to memory management. One challenge with reference counting is cycles that prevent objects from being deallocated. Systems such as the C++ and Rust standard libraries introduce two types of reference: strong and weak. A strong reference allows access to the object and prevents the object from being deallocated, while a weak reference only prevents deallocation. A weak reference can be upgraded to provide a strong reference if other strong references to the object exist. Hence, the upgrade operation is partial, and may fail dynamically. The classic implementation of this upgrade operation is not wait-free—it can take arbitrarily long to complete if there is contention on the reference count.

In a new paper: Wait-Free Weak Reference Counting, researchers from Microsoft propose a wait-free algorithm for weak reference counting, which requires primitive wait-free atomic operations of “compare and swap”, and “fetch and add”. The paper includes a correctness proof of the algorithm using the Starling verification tool, a full implementation in C++, and a demonstration of the best- and worst-case performance using micro-benchmarks.

The new algorithm is faster than the classic algorithm in the best case, but has an overhead in the worst case. The researchers present a more complex algorithm that effectively combines the classic algorithm and the wait-free algorithm, delivering much better performance in the worst case, while maintaining the benefits of the wait-free algorithm.


NEW RESEARCH

Disaggregating Stateful Network Functions

For security, isolation, metering, and other purposes, public clouds today implement complex network functions at every server. Today’s implementations, in software or on FPGAs and ASICs that are attached to each host, are becoming increasingly complex and costly, creating bottlenecks to scalability.

In a new paper: Disaggregating Stateful Network Functions, researchers from Microsoft present a different design that disaggregates network function processing off the host and into shared resource pools by making novel use of appliances which tightly integrate general-purpose ARM cores with high-speed stateful match processing ASICs. When work is skewed across VMs, such disaggregation can offer better reliability and performance over the state of the art, at a lower per-server cost. The paper, which was published at the 2023 USENIX Symposium on Networked Systems Design and Implementation (NSDI), includes solutions to the consequent challenges and presents results from a production deployment at a large public cloud.


NEW RESEARCH

Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote

Testing programs with concurrency is challenging because their execution is non-deterministic, making bugs hard to find, re-produce and debug. Non-determinism can cause flaky tests—which may pass or fail without any code changes—creating a significant engineering burden on development teams. As concurrency is essential for building modern multi-threaded or distributed systems, solutions are required to help developers test their concurrent code for correctness.

Testing concurrent programs comes with two main challenges. First is the problem of reproducibility or control, while the second challenge is the state-space explosion problem. A concurrent program, even with a fixed test input, can have an enormous number of possible behaviors.

In a new research paper: Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote, researchers from Microsoft describe the design and implementation of the open-source tool Coyote for testing concurrent programs written in the C# language. This research won a 2023 Best Software Science Paper award from The European Association of Software Science and Technology (EASST).

The post Research Focus: Week of June 19, 2023 appeared first on Microsoft Research.

Read More

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

black and white photos of Microsoft Principal Researcher Dr. Bichlien Nguyen and Dr. David Kwabi, Assistant Professor of Mechanical Engineering at the University of Michigan, next to the Microsoft Research Podcast

Episode 141 | June 22, 2023

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Microsoft Principal Researcher Dr. Bichlien Nguyen and Dr. David Kwabi, Assistant Professor of Mechanical Engineering at the University of Michigan, join host Dr. Gretchen Huizinga to talk about how their respective research interests—and those of their larger teams—are converging to develop renewable energy storage systems. They specifically explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables for a not rainy day. The bonus? These new compounds may just help advance carbon capture, too.

Transcript

[MUSIC PLAYS UNDER DIALOGUE]

DAVID KWABI: I’m a mechanical engineer who sort of likes to hang out with chemists.

BICHLIEN NGUYEN: I’m an organic chemist by training, and I dabble in machine learning. Bryan’s a computational chemist who dabbles in flow cell works. Anne is, uh, a purely synthetic chemist who dabbles in almost all of our aspects.

KWABI: There’s really interesting synergies that show up just because there’s people, you know, coming from very different backgrounds.

NGUYEN: Because we have overlap, we, we have lower, I’m going to call it an activation barrier, in terms of the language we speak.

[MUSIC]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]


Today I’m talking to Dr. Bichlien Nguyen, a Principal Researcher at Microsoft Research, and Dr. David Kwabi, an Assistant Professor of Mechanical Engineering at the University of Michigan. Bichlien and David are collaborating on a fascinating project under the umbrella of the Microsoft Climate Research Initiative that brings organic chemistry and machine learning together to discover new forms of renewable energy storage. Before we unpack the “computational design and characterization of organic electrolytes for flow batteries and carbon capture,” let’s meet our collaborators.

Bichlien, I’ll start with you. Give us a bit more detail on what you do at Microsoft Research and the broader scope and mission of the Microsoft Climate Research Initiative.

BICHLIEN NGUYEN: Thanks so much, Gretchen, for the introduction. Um, so I guess I’ll start with my background. I have a background in organic electric chemistry, so it’s quite fitting, and as a researcher at Microsoft, really, it’s my job, uh, to come up with the newest technologies and keep abreast of what is happening around me so that I can actually, uh, fuse those different technology streams together and create something that’s, um, really valuable. And so as part of that, uh, the Microsoft Climate Research Initiative was really where a group of researchers came together and said, “How can we use the resources, the computational resources, and expertise at Microsoft to enable new technologies that will allow us to get to carbon negative by the year 2050? How can we do that?” And that, you know, as part of that, um, I just want to throw out that the Microsoft Climate Research Initiative really is focusing on three pillars, right. The three pillars are being carbon accounting, because if you don’t know how much carbon is in the atmosphere, you can’t really, uh, do much to remedy it, right, if you don’t know what’s there. The other one is climate resilience. So how do people get affected by climate change, and how do we overcome that, and how can we help that with technology? And then the third is materials engineering, where, that’s where I sit in the Microsoft Climate Research Initiative, and that’s more of how do we either develop technologies that, um, are used to capture and store carbon, uh, or are used to enable the green energy transition?

HUIZINGA: So do you find yourself spread across those three? You say the last one is really where your focus is, but do you dip your toe in the other areas, as well?

NGUYEN: I love dipping my toe in all the areas because I think they’re all important, right? They’re all important. We have to really understand what the environmental impacts of all the materials, for example, that we’re making are. I mean, it’s so … so carbon accounting is really important. Environmental accounting is very important. Um, and then people are the ones that form the core, right? Why are we, do … why do we do what we do? It’s because we want to make sure that we can enable people and solve their problems.

HUIZINGA: Yeah, when you talk about carbon accounting and why you’re doing it, it makes me think about when you have to go on a diet and the doctor says, “You have to get really honest about what you’re eating. Don’t, don’t fake it.” [LAUGHS] So, David, you’re a professor at the University of Michigan, and you run the eponymous Kwabi Lab there. Tell us about your work in general. What are your research interests? Who do you work with, and what excites you most about what you do?

DAVID KWABI: Yeah, happy to! Thank you for the introduction and, and for having me on, on here today. So, um, uh, so as you said, I run the Kwabi Lab here at the University of Michigan, and the sort of headline in terms of what we’re interested in doing is that we like to design and study batteries that can store lots of renewable electricity, uh, on the grid, so, so that’s kind of our mission. Um, that’s not quite all of what we do, but it’s, uh, it’s how I like to describe it. Um, and the motivation, of course, is … comes back to what Bichlien just mentioned, is this need for us to transition from carbon-intensive, uh, ways of producing energy to renewables. And the thing about renewables is that they’re intermittent, so solar and wind aren’t there all the time. You need to find a way to store all that energy and store it cheaply for us to really make, make a dent, um, in carbon emissions from energy production. So we work on, on building systems or energy storage systems that can meet that goal, that can accomplish that task.

HUIZINGA: Both of you talked about having larger teams that support the work you’re doing or collaborate with you two as collaborators. Do you want to talk about the size and scope of those teams or, um, you know, this collaboration across collaboration?

KWABI: Yeah, so I could start with that. So my group, like you said, we’re in the mechanical engineering department, so we really are, um, we call ourselves electric chemical engineers, and electric chemistry is the science of batteries, but it’s a science of lots of other things besides that, but the interesting thing about energy storage systems or batteries in general is that you need to build and put these systems together, but they’re made of lots of different materials. And so what we like to do in my group is build and put together these systems and then essentially figure out how they perform, right. Uh, try to explore performance limits as a function of different chemistries and system configurations and so on. But the hope then is that this can inform materials chemists and computationalists in terms of what materials we want to make next, sort of, so, so there’s a lot of need for collaboration and interdisciplinary knowledge to, to make progress here.

HUIZINGA: Yeah. Bichlien, how about you in terms of the umbrella that you’re under at Microsoft Research?

NGUYEN: There are so many different disciplines, um, within Microsoft Research, but also with the team that we’re working, you know, with David. So we have actually two other collaborators from two different, I guess, departments. There’s the chemical engineering department, which Bryan Goldsmith is part of, and Anne McNeil, I believe, is part of the chemistry department. And you know, for this particular project, on flow battery electrolytes for energy storage, um, we do need a multidisciplinary team, right? We, we need to go from the atomistic, you know, simulation level all the way to the full system level. And I think that’s, that’s where, you know, that’s important.

HUIZINGA: Now that we’re on the topic of this collaboration, let’s talk about how it came about. I like to call this “how I met your mother.” Um, what was the initial felt need for the project, and who called who and said, “Let’s do some research on renewable climate-friendly energy solutions?” Bichlien, why don’t you go ahead and take the lead on this?

NGUYEN: Yeah. Um, so I’m pretty sure what happened—and David, correct me if I’m wrong—

[LAUGHTER]

HUIZINGA: Pretty sure … !

NGUYEN: I’m pretty sure, but not 100 percent sure—is that, um, while we were formulating how to … uh, what, what topics we wanted to target for the Microsoft Climate Research Initiative, we began talking to many different universities as a way to learn from them, to see what areas of interest and what areas they think are, uh, really important for the future. And one of those universities was the University of Michigan, and I believe David was one of few PIs on that initial Teams meeting. And David gave, I believe … David, was it like a 10-minute presentation? Very quick, right? Um, but it sparked this moment of, “Wow, I think we could accelerate this.”

HUIZINGA: David, how do you remember it?

KWABI: Yeah, I think I remember it. [LAUGHS] This is so almost like a, like a marriage. Like, how did you guys meet? Um, and then, and then the stories have to align in some way or …

HUIZINGA: Yeah, who liked who first?

KWABI: Yeah, exactly. Um, but yeah, I think, I think that’s what I recall. So basically, I’m part of … here at the university, I’m part of this group called the Global CO2 Initiative, uh, which is basically, uh, an institute here at the university that convenes research related to CO2 capture, CO2 utilization, um, and I believe the Microsoft team set up a meeting with the Global CO2 Initiative, which I joined, uh, in my capacity as a member, and I gave a little 10-minute talk, which apparently was interesting enough for a second look, so, um, that, that’s how the collaboration started. There was a follow-up meeting after that, and here we are today.

HUIZINGA: Well, it sounds like you’re compelling, so let’s get into it, David. Now would be a good time to talk about, uh, more detail on this research. I won’t call it Flow Batteries for Dummies, but assume we’re not experts. So what are flow batteries, what problems do they solve, and how are you going after your big research goals methodologically?

KWABI: OK, so one way to think about flow batteries is to, to think first about pumped hydro storage is, is how I like to introduce it. So, so a flow battery is just like a battery, the, the sort of battery that you have in your cell phone or laptop computer or electric vehicle, but it’s a … it has a very different architecture. Um, and to explain that architecture, I like to talk about pumped hydro. So pumped hydro is I think a technology many of us probably appreciate or know about. You have two reservoirs that, that hold water—so upper and lower reservoirs—and when you have extra electricity, or, or excess electricity, you can pump water up a mountain, if you like, from one reservoir to another. And when you need that electricity, water just flows down, spins some turbines, and produces electricity. You’re turning gravitational potential energy into electrical energy, or electricity, is the idea. And the nice thing about pumped hydro, um, is that if you want to store more energy, you just need a bigger reservoir. So you just need more water essentially, um, in, in the two reservoirs to get to longer and longer durations of storage, um, and so then this is nice because more and more water is actually … is cheap. So the, the marginal cost of your … every stored, um, unit of energy is quite low. This isn’t really the case for the source of batteries we have in our cell phones and laptop computers. So if we’re talking about grid storage, you want something like this, something that decouples energy and power, so we have essentially a low cost per unit of electricity. So, so flow batteries essentially mimic pumped hydro because instead of turning gravitational potential energy into electricity, you’re actually changing or turning chemical potential energy, if you like, into electricity. So essentially what you’re doing is just storing energy in, um, in the form of electrons that are sort of attached to molecules. So you have an electron at a really high energy that is like flowing onto another molecule that has a low energy. That’s essentially the transformation that you’re trying to do in a, in a flow battery. But that’s the analogy I like to, I like to draw. It’s sort of a high- and low-reservation, uh, reservoirs you have. High and low chemical, uh, potential energy.

HUIZINGA: So what do these do better than the other batteries that you mentioned that we’re already using for energy storage?

KWABI: So the other batteries don’t have this decoupling. So in the flow battery, you have the, the energy being stored in these tanks, and the larger the tank, the more the energy. If you want to turn that energy into, uh, chemical energy into electricity, you, you, you run it through a reactor. So the reactor can stay the same size, but the tank gets bigger and bigger and you store more energy. In a laptop battery, you don’t have that. If you want more energy, you just want more battery, and that, the cost of that is the same. So there isn’t this big cost advantage, um, that comes from decoupling the energy capacity from the power capacity.

NGUYEN: David, would you, would you also say that, um, with, you know, redox organic flow batteries, you also kind of decouple the source, right, of the, the material, the battery material, so it’s no longer, for example, a rare earth metal or precious metal.

KWABI: Absolutely. So that’s, that’s then the thing. So when you … so you’ve got … you know, imagine these large systems, these giant tanks with, with molecules that store electricity. The question then is what molecules do you choose? Because if it’s like really expensive, then your electricity is also very expensive. Um, the particular type of battery we’re looking at uses organic molecules to store that electricity, and the hope is that these organic molecules can be made very cheaply from very abundant materials, and in the end, that means that this then translates to a really low cost of electricity.

HUIZINGA: Bichlien, I’m glad you brought that up because that is a great comparison in terms of the rare earth stuff, especially lithium mining right now is a huge cost, or tax, on the environment, and the more electric we have, the more lithium we need, so there’s other solutions that you guys are, are digging around for. Were you going to say something else, Bichlien?

NGUYEN: I was just going to say, I mean, another reason why, um, we thought David’s work was so interesting is because, you know, we’re looking at, um, energy, um, storage for renewables, and so to get to this green energy economy, we’ll need a ton of renewables, and then we’ll need a ton of ways to store the energy because renewables are, you know, they’re intermittent. I mean, sometimes the rain rains all the time, [LAUGHS] and sometimes it doesn’t. It’s really dry. Um, I don’t know why I say rain, but I assume … I probably …

HUIZINGA: Because you’re in Seattle, that’s why!

NGUYEN: That’s true. But like the sun shines; it doesn’t shine. Um, yeah, the wind blows, and sometimes it doesn’t.

HUIZINGA: Or doesn’t. Yeah. … Well, let’s talk about molecules, David, um, and getting a little bit more granular here, or maybe I should say atomic. You’re specifically looking for aqueous-soluble redox-active organic molecules, and you’ve noted that they’re really hard to find, um, these molecules that meet all the performance requirements for real-world applications. In other words, you have to swipe left a lot before you get to a good [LAUGHS] match, continuing with the marriage analogy. … So what are the properties necessary that you’re looking for, and why are they so hard to find?

KWABI: So the “aqueous soluble” just means soluble in water. You want the molecule to be able to dissolve into water at really high concentrations. So that’s one, um, property. You need it to last a really long time because the hope is that these flow battery installations are going to be there for decades. You need it to store electrons at the right energy. So, uh, I mentioned you have two tanks: one tank will store electrons at high energy; the other at low energy. So you need those energy levels to be set just right in a sense, so you want a high-voltage battery, essentially. You also want the molecule to be set that it doesn’t leak from one tank to the other through the reactor that’s in the middle of the two tanks, right. Otherwise, you’re essentially losing material, which is not, uh, desirable. And you want the molecule to be cheap. So that, that’s really important, obviously, because if we’re going to do this at, um, really large scale, and we want this to be low cost, that we want something that’s abundant and cheap. And finding a molecule that satisfies all of these requirements at the same time is really difficult. Um, you can find molecules that satisfy three or four or two, but finding something that hits all the, all the criteria is really hard, as is finding a good partner. [LAUGHS]

HUIZINGA: Well, and even in, in other areas, you hear the phrase cheap, fast, good—pick two, right? So, um, yeah, finding them is hard, but have you identified some or one, or I mean where are you on this search?

KWABI: Right now, the state-of-the-art charge-storing molecule, if you like, is based on a rare earth … rare element called vanadium. So the most developed flow batteries now use vanadium, um, to store electricity. But, uh, vanadium is pretty rare in the Earth’s crusts. It’s unclear if we start to scale this technology, um, to levels that would really make an impact on climate, it’s not clear there’s enough vanadium, uh, to, to do the job. So it fulfills a bunch of, a bunch of the criteria that I just mentioned, but not, not the cheap one, which is pretty important. We’re hoping, you know, with this project, that with organic molecules, we can find examples or particular compounds that really can fulfill all of these requirements, and, um, we’re excited because organic chemistry gives us, uh … there’s a wide design space with organic molecules, and you’re starting from abundant elements, and, you know, the hope is that we can really get to something that, that, you know, we can swipe left on … is it swipe left or right? I’m not sure.

NGUYEN: I have no idea.

HUIZINGA: Swipe left means …

[LAUGHTER]

HUIZINGA: I looked it up. I, I’ve been married for a really long time, so I did look it up, and it is left if you’re not interested and right if you are, apparently on Tinder, but, uh, not to beat that horse …

KWABI: You want to swipe right eventually.

HUIZINGA: Yes. Which leads me to, uh, Bichlien. What does machine learning have to do with natural sciences like organic chemistry? Why does computation play a role here, particularly generative models, for climate change science?

NGUYEN: Yeah so what, you know, the past decade or two, um, in computer science and machine learning have taught us is that ML is really good at pattern recognition, right, being able to, um, take complex datasets and pull out the most type … you know, relevant, uh, trends and information, and, and it’s good at classifying … you know, used as a classification tool. Uh, and what we know about nature is that nature is full of patterns, right. We see repeating patterns all the time in nature, at many different scales. And when we think, for example, of all of the combinations of carbons, carbon organic molecules, that you could make, you see around 1060; it’s estimated to be 1060. Um, and those are all connected somehow, you know, in this large, you know, space, this large, um, distribution. And we want to, for example, as David mentioned, we want to check the boxes on all these properties. So what is really powerful, we believe, is that generative models will allow us to sample this, this organic chemistry space and allow us to condition the outputs of these models on these properties that David wants to checkmark. And so in a way, it’s allowing us to do more efficient searching. And I like to think about it as like you’re trying to find a needle, right, in the ocean, and the ocean’s pretty vast; needles are really small. And instead of having the size of the ocean as your search space, maybe you have the size of a bathtub and so that we can narrow down the search space and then be able to test and validate some of the, the molecules that come out.

HUIZINGA: So do these models then eliminate a lot of the options, making the pool smaller? Is that how that works to make it a bathtub instead of an ocean?

NGUYEN: I, I wouldn’t say eliminates, but it definitely tells you where you should be … it helps focus where you’re searching.

HUIZINGA: Right, right, right. Well, David, you and I talked briefly and exchanged some email on “The Elements” song by Tom Lehrer, and it’s, it’s a guy who basically sings the entire periodic chart of the elements, really fast, to the piano. But at the end, he mentions the fact that there’s a lot that haven’t been discovered. There’s, there’s blanks in the chart. And so I wonder if, you know, this, this search for molecules, um, it just feels like is there just so much more out there to be discovered?

KWABI: I don’t know if there’s more elements to be discovered, per se, but certainly there’s ways of combining them in ways that produce …

HUIZINGA: Aaahhhhh …

KWABI: … new compounds or compounds with properties that, that we’re looking for, for example, in this project. So, um, that’s, I think, one of the things that’s really exciting about, uh, about this particular endeavor we’re, we’re, we’re engaged in. So, um, one of the ways that people have traditionally thought about finding new molecules for flow batteries is, you know, you go into the lab or you go online and order a chemical that you think is going to be promising [LAUGHS] … some people I know have done this, uh, myself included … but you, you order a chemical that you think is promising, you throw it in the flow battery, and then you figure out if it works or not, right. And if it doesn’t work, you move on to the next compound, or you, um …

NGUYEN: You tweak it!

KWABI: … if it does work, you publish it. Yeah, exactly—you tweak it, for example. Um, but one of the, one of the questions that we get to ask in this project is, well, rather than think about starting from a molecule and then deciding or figuring out whether it works, we, we actually start from the criteria that we’re looking for and then figure out if we can intelligently design, um, a molecule based on the criteria. Um, so it’s, it’s, uh, I think a more promising way of going about discovering new molecules. And, as Bichlien’s already alluded to, with organic chemistry, the possibilities are endless. We’ve seen this already in, like, the pharmaceutical industry for example, um, and lots of other industries where people think about, uh, this combinatorial problem of, how do I get the right structure, the right compound, that solves the problem of, you know, killing this virus or whatever it is. We’re hoping to do something similar for, uh, for flow batteries.

HUIZINGA: Yeah, in fact, as I mentioned at the very beginning of the show, you titled your proposal “The computational design and characterization of organic electrolytes for flow batteries,” so it’s kind of combining all of that together. David, sometimes research has surprising secondary uses. You start out looking for one thing and it turns out to be useful for something else. Talk about the dual purposes of your work, particularly how flow batteries both store energy and work as a sort of carbon capture version of the Ghostbusters’ Ecto-Containment Unit. [LAUGHS]

KWABI: Sure. Yeah, so this is where I sort of confess and say I wasn’t completely up front in the beginning when I said all we do is energy storage, but, um, another, um, application we’re very interested in is carbon capture in my group. And with regard to flow batteries, it turns out that you, you actually can take the same architecture that you use for a flow battery and actually use it to, to capture carbon, CO2 in particular. So the way this would work is, um, it turns out that some of the molecules that we’ve been talking about, some of the organic molecules, when you push an electron onto them—so you’re storing energy and you push an electron onto them—it turns out that some of these molecules also absorb hydrogen ions from water so those two processes sort of happen together. You push an electron onto the molecule, and then it picks up a hydrogen ion from water. Um, and if you remember anything about something from your chemistry classes in high school, that changes the pH of water. If you remove protons, uh, from water, that makes water more basic, or more alkaline. And alkaline electrolytes or alkaline water actually absorbs or reacts with CO2 to make bicarbonate. So that’s a chemical reaction that can serve as a mode, or a mechanism, for removing CO2 from, from the environment, so it could be air, or it could be, uh, flue gas or, you know, exhaust gas from a power plant. So imagine you, you run this process, you push the electron onto the molecule, you change the pH of the solution, you remove CO2 … that can then … you can actually concentrate that CO2 and then run the opposite reaction. So you pull the electron off the molecule; that then dumps protons back into solution, and then you can release all this pure CO2 all of a sudden. So, so now what you can do is take a flow battery that stores energy, but also, uh, use it to separate CO2, separate and concentrate CO2 from a, from a gaseous source. So this is, um, some work that we’ve been pursuing sort of in parallel with our work on energy storage, and the hope is that we can find molecules that, in principle, maybe could do both—could do the energy storage and also help with, uh, with CO2 separation.

HUIZINGA: Bichlien, is that part of the story that was attractive to Microsoft in terms of both storage for energy and getting rid of CO2 in the environment?

NGUYEN: Yeah, absolutely. Absolutely. Of course, the properties of, of, you know, both CO2 capture and the energy storage components are sometimes somewhat, uh—David, correct me if I’m wrong—kind of divergent. It’s, it’s hard to optimize for one and have the other one optimized, too. So it’s really a balance of, of things, and we’re targeting, just right now, for this project, our joint project, the, the energy storage aspect.

HUIZINGA: Yeah. On that note, and either one of you can take this, what do you do with it? I mean, when I, when I used the Ghostbusters’ Ecto-Containment Unit, I was being direct. I mean, you got to put it somewhere once you capture it, whether it’s storing it for good use or getting rid of it for bad use. So how are you managing that?

KWABI: Great question, so … Bichlien, were you going to … were you going to go?

NGUYEN: Oh, I mean, yeah, I was going to say that there are many ways, um, for I’ll call it CO2 abatement, um, once you have it. Um, there are people who are interested in storing it underground, so, uh, mineralizing it in basalt formations, rock formations. There are folks, um, like me, who are interested in, you know, developing new catalysts so that we can convert CO2 to different renewable feedstocks that can be used in different materials like different plastics, different, um, you know, essentially new fuels, things of that nature. And then there’s, you know, commercial applications for pure streams of CO2, as well. Uh, yeah, so I, I would say there’s a variety of things you can do with CO2.

HUIZINGA: What’s happening now? I mean, where does that generally … David we, I, I want to say, we talked about this issue, um, when we met before on some of the downsides of what’s current.

KWABI: Yeah, so currently, um, so there’s, as Bichlien has mentioned, there’s a number of things you could do with it. But right now, of all the sort of large projects that have been set up, large pilot plants for CO2 capture that have been set up, I think the main one is enhanced oil recovery, which is a little bit controversial, um, because what you’re doing with the CO2 there is you’re pumping it underground into an oil field that has become sort of less productive over time. And the goal there is to try to coax a little bit more oil, um, out of this field. So, so you pump the CO2 underground, it mixes in with the oil, and then you … that, that sort of comes back up to the surface, and you separate the CO2 from the oil, and you can, you can go off and, um, use the oil for whatever you use it for. So, so the economically attractive thing there is, there’s, uh, there’s, there’s going to be some sort of payoff. There’s a reason, a commercial incentive, for separating the CO2, uh, but of course the problem is you’re removing oil from the … you’re, you’re extracting more oil that’s going to end up with … in more CO2 emissions. So, um, there are, in principle, many potential options, but there aren’t very many that have both the sort of commercial … uh, where there’s sort of a commercial impact and there’s also sort of the scale to take care of the, you know, the gigatons of CO2 that we’re going to have to draw down, basically, so … .

NGUYEN: Yeah. And I, I think, I mean, you know, to David’s point, that’s true—that, that is what’s happening, you know, today because it provides value, right? The issue, I think, with CO2 capture and storage is that while there’s global utility, there’s no monetary value to it right now. Um, and so it makes it a challenge in terms of being able to industrialize, you know, industries to take care of the CO2. But I, I, I think, you know, as part of the MCRI initiative, you know, we’re very interested in both the carbon capture and the utilization aspect, um, and utilization would mean utilizing the CO2 in productive ways for long-term storage, so think about maybe using CO2, um, converting it electrochemically, for example, into, uh, different monomers. Those monomers maybe could be used in new plastics for long-term storage. Uh, maybe those are recyclable plastics. Maybe they’re plastics that are easily biodegradable. But, you know, one of the issues with using, or manufacturing, is there’s always going to be energy associated with manufacturing. And so that’s why we care a lot about renewables and, and the green energy transition. And, and that’s why, uh, you know, we’re collaborating with David and his team as, as part of that. It’s really full circle. We have to really think about it on a systems level, and the collaboration with David is, is one part of that system.

HUIZINGA: Well, that leads beautifully, and that’s probably an odd choice of words for this question, but it seems like “solving for X” in climate change is a no-lose proposition. It’s a good thing to do. But I always ask what could possibly go wrong, and in this case, I’m thinking about other solutions, some of which you’ve already mentioned, that had seemed environmentally friendly at first, but turned out to have unforeseen environmental impacts of their own. So even as you’re exploring new solutions to renewable energy sources, how are you making sure, or how are you mitigating, harming the environment while you’re trying to save it?

KWABI: That’s a great question. So it’s, it’s something that I think isn’t traditionally, at least in my field, isn’t traditionally sort of part of the “solve for X” when people are thinking about coming up with a new technology or new way of storing renewable electricity. So, you know, in our particular case, one of the things that’s really exciting about the project we’re working on is we’re looking at molecules that are fairly already quite ubiquitous, so, so they’re already being used in the food and textile industry, for example, derivatives of the molecules we’re using. So, you know, thinking about the materials you’re using and the synthetic routes that are necessary to produce them is sort of a pitfall that one can easily sort of get into if you don’t start thinking about this question at the very beginning, right? You might come up with a technology that’s, um, appealing and that works really well, performance-wise, but might not be very recyclable or might have some difficulties in terms of extraction and so on and so forth. So lithium-ion batteries, for example, come to mind. I think you were alluding to this earlier, that, you know, they’re a great technology for electric vehicles, but mining cobalt, extracting cobalt, comes with a lot of, um, just negative impacts in terms of child labor and so on in the Congo, et cetera. So how, how do we, you know, think about, you know, materials that don’t … that sort of avoid this? And I’ll, I’ll just highlight as one of our team members … so Anne McNeil, who’s in the chemistry department here, thinks quite a lot about this, and that’s appropriate because she’s sort of the synthetic chemist on the team. She’s the one who’s thinking a lot about, you know, given we have this molecule we want to make, what’s the most eco-friendly, sustainable route to making that molecule with materials that don’t require, you know, pillaging and polluting the earth to do it in a sense, right. And also materials … and also making it in a way that, you know, at end of life, it can be potentially recycled, right.

HUIZINGA: Right.

KWABI: So thinking about sustainable routes to making these molecules and potential sort of ways of recycling them are things that, um, we’re, we’re trying to, in some sense, to take into consideration. And by we, I mean Anne, specifically, is thinking quite seriously about …

NGUYEN: David … David, can I put words in your mouth?

KWABI: But, yeah. … Yeah, sure, go ahead.

NGUYEN: Um, you’re, you’re thinking of sustainability as being a first design principle for …

KWABI: Yes! I would take those words! Exactly.

NGUYEN: OK. [LAUGHS] Yeah, I mean, that’s really important. I, I agree and second what David said.

HUIZINGA: Bichlien, when we talked earlier, the term co-optimization came up, and I want to dig in here a little bit because whenever there’s a collaboration, each discipline can learn something from the other, but you can also learn about your own in the process. So what are some of the benefits you’ve experienced working across the sciences here for this project? Um, could you provide any specific insights or learnings from this project?

NGUYEN: I mean, I, I think maybe a naive … something that maybe seems naive is that we definitely have to work together in all three disciplines because what we’re also learning from David and Bryan is that there are different experimental and computational timelines that sometimes don’t agree, and sometimes do agree, and we really have to, uh, you know, work together in order to create a unified, I’m not going to call it a roadmap, but a unified research plan that works for everyone. For example, um, it takes much longer to run an experiment to synthesize a molecule … I, I think it takes much longer to synthesize a molecule than, for example, to run a, uh, flow cell, um, experiment. And then on the computational side, you could probably run it, you know, at night, on a weekend, you know, have it done relatively soon, generate molecules. And one of those that we’re, you know, understanding is that the human feedback and the computational feedback, um, it takes a lot of balancing to make sure that we’re on the same track.

HUIZINGA: What do you think, David?

KWABI: Yeah, I think that’s definitely accurate, um, figuring out how we can work together in a way that sort of acknowledges these timelines is really important. And I think … I’m a big believer in the fact that people from somewhat different backgrounds working together, the diversity of background, actually helps to bring about, you know, really great innovative solutions to things. And there’s various ways that this has sort of shown up in our, in own work, I think, and in our, in our discussions. Like, you know, we’re currently working on a particular sort of molecular structure for, uh, for a compound that we think will be promising at storing electricity, and the way we, we came about with it is that my, my group, you know, we ran a flow cell and we saw some data that seemed to suggest that the molecule was decomposing in a certain way, and then Anne’s group, or one of Anne’s students, proposed a mechanism for what might be happening. And then Jake, who works with Bichlien, also … and then thought about, “Well, what, what about this other structure?” So that sort of … and then that’s now informing some of the calculations that are going on, uh, with Bryan. So there’s really interesting synergies that show up just because there’s people working from, you know, coming from very different backgrounds. Like I’m a mechanical engineer who sort of likes to hang out with chemists and, um, there’s actual chemists and then there’s, you know …

NGUYEN: But, David, I think …

KWABI: … the people who do computation, and so on …

NGUYEN: I think you’re absolutely right here in terms of the overlap, too, right? Because in a, in a way, um, I’m an organic chemist by training, and I dabble in machine learning. You’re a mechanical engineer who dabbles in chemistry. Uh, Bryan’s a computational chemist who dabbles in flow cell works. Uh, Anne is, uh, you know, a purely synthetic chemist who dabbles in, you know, almost all of our aspects. Because we have overlap, we have lower, I’m going to call it an activation barrier, [LAUGHS] in terms of the language we speak. I think that is something that, you know, we have to speak the same language, um, so that we can understand each other. And sometimes that can be really challenging, but oftentimes, it’s, it’s not.

HUIZINGA: Yeah, David, all successful research projects begin in the mind and make their way to the market. Um, where does this research sit on that spectrum from lab to life, and how fast is it moving as far as you’re concerned?

KWABI: Do you mean the research, uh, in general or this project?

HUIZINGA: This project, specifically.

KWABI: OK, so I’d say we’re, we’re still quite early at this stage. So there’s a system of classification called Technology Readiness Level, and I would say we’re probably on the low end of the scale, I don’t know, maybe like a 1 or 2.

NGUYEN: We just started six months ago!

KWABI: We just started six months ago! So …

[LAUGHTER]

HUIZINGA: OK, that’s early. Wait, how many levels are there? If there’s 1 or 2, what’s the high end?

KWABI: I think we go up to an 8 or so, an 8 or a 9. Um, so, so we’re quite early; we just started. But the, the nice thing about this field is that things can move really quickly. So in a year or two, who knows where we’ll be? Maybe four or five, but things are still early. There’s a lot of fundamental research right now that’s happening …

HUIZINGA: Which is so cool.

KWABI: Proof of concept. Which is necessary, I think, before you can get to the, the point where you’re, um, you’re spinning out a company or, or moving up to larger scales.

HUIZINGA: Right. Which lives very comfortably in the academic world. Bichlien, Microsoft Research is sort of a third space where they allow for some horizon on that scale in terms of how long it’s going to take this to be something that could be financially viable for Microsoft. Is that just not a factor right now? It’s just like, let’s go, let’s solve this problem because this is super-important?

NGUYEN: I guess I’ll say that it takes roughly 20 years or so to get a proof of concept into market at an industrial scale. So, I’m … what I’m hoping that with this collaboration, and with others, is that we can shorten the time for discovery so that we understand the fundamentals and we have a good baseline of what we think can be achieved so that we can go to, for example, a pilot scale, like a test scale, outside of the laboratory, not full industrial scale, but just a pilot scale much faster than we would if we had to hand iterate every single molecule.

HUIZINGA: So the generative models play a huge role in that shortening of the time frame …

NGUYEN: Yes, yes. That’s what we …

KWABI: Yeah, I think …

NGUYEN: Go ahead, David.

KWABI: Yeah. I think the idea of having a platform … so, so rather than, you know, you found this wonderful, precious molecule that you’re going to make a lot of, um … you know, having a platform that can generate molecules, right, I think is, you know, proving that this actually works gives you a lot more shots on goal, basically. And I think that, you know, if we’re able to show that, in the next year or two, that there’s, there’s a proof of concept that this can go forward, then um, then, in principle, we have many more chemistries to work with and play with, than the …

NGUYEN: Yeah, and, um, we might also be able to, you know, with, with this platform, discover molecules that have that dual purpose, right, of both energy storage and carbon capture.

HUIZINGA: Well, as we wrap up, I’d love to know in your fantastical ideal preferred future, what does your work look like … now, I’m going to say five to 10 years, but, Bichlien, you just said 20 years, [LAUGHS] so maybe I’m on the short end of it here. In the “future,” um, how have you changed the landscape of eco-friendly, cost-effective energy solutions?

KWABI: That’s a, that’s a big question. I, I tend to think in more two–, three–year timelines sometimes. [LAUGHS] But I think in, in, in, you know, in like five, 10 years, if this research leads to a company that’s sort of thriving and demonstrating that flow batteries can really make an impact in terms of low-cost energy storage, that would have been a great place to land. I mean that and the demonstration that you, you know, with artificial intelligence, you can create this platform that can, uh, custom design molecules that fulfill these criteria. I think that would be, um, that would be a fantastic outcome.

HUIZINGA: Bichlien, what about you?

NGUYEN: So I think in one to two years, but I also think about the 10-to-20-year timeline, and what I’m hoping for is, again, to demonstrate the value of AI in order to enable a carbon negative economy so that we can all benefit from it. It sounds very … a polished answer, but I, I really think there are going to be accelerations in this space that’s enabled by these new technologies that are coming out.

HUIZINGA: Hmm.

NGUYEN: And I hope so! We have to save the planet!

KWABI: There’s a lot more to AI than ChatGPT and, [LAUGHS] you know, language models and so on, I think …

HUIZINGA: That’s a perfect way to close the show. So … Bichlien Nguyen and David Kwabi, thank you so much for coming on. It’s been delightful—and informative!

NGUYEN: Thanks, Gretchen.

KWABI: Thank you very much.

The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.

Read More