Injecting fairness into machine-learning models

If a machine-learning model is trained using an unbalanced dataset, such as one that contains far more images of people with lighter skin than people with darker skin, there is serious risk the model’s predictions will be unfair when it is deployed in the real world.

But this is only one part of the problem. MIT researchers have found that machine-learning models that are popular for image recognition tasks actually encode bias when trained on unbalanced data. This bias within the model is impossible to fix later on, even with state-of-the-art fairness-boosting techniques, and even when retraining the model with a balanced dataset.      

So, the researchers came up with a technique to introduce fairness directly into the model’s internal representation itself. This enables the model to produce fair outputs even if it is trained on unfair data, which is especially important because there are very few well-balanced datasets for machine learning.

The solution they developed not only leads to models that make more balanced predictions, but also improves their performance on downstream tasks like facial recognition and animal species classification.

“In machine learning, it is common to blame the data for bias in models. But we don’t always have balanced data. So, we need to come up with methods that actually fix the problem with imbalanced data,” says lead author Natalie Dullerud, a graduate student in the Healthy ML Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.

Dullerud’s co-authors include Kimia Hamidieh, a graduate student in the Healthy ML Group; Karsten Roth, a former visiting researcher who is now a graduate student at the University of Tubingen; Nicolas Papernot, an assistant professor in the University of Toronto’s Department of Electrical Engineering and Computer Science; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. The research will be presented at the International Conference on Learning Representations.

Defining fairness

The machine-learning technique the researchers studied is known as deep metric learning, which is a broad form of representation learning. In deep metric learning, a neural network learns the similarity between objects by mapping similar photos close together and dissimilar photos far apart. During training, this neural network maps images in an “embedding space” where a similarity metric between photos corresponds to the distance between them.

For example, if a deep metric learning model is being used to classify bird species, it will map photos of golden finches together in one part of the embedding space and cardinals together in another part of the embedding space. Once trained, the model can effectively measure the similarity of new images it hasn’t seen before. It would learn to cluster images of an unseen bird species close together, but farther from cardinals or golden finches within the embedding space.

The similarity metrics the model learns are very robust, which is why deep metric learning is so often employed for facial recognition, Dullerud says. But she and her colleagues wondered how to determine if a similarity metric is biased.

“We know that data reflect the biases of processes in society. This means we have to shift our focus to designing methods that are better suited to reality,” says Ghassemi.

The researchers defined two ways that a similarity metric can be unfair. Using the example of facial recognition, the metric will be unfair if it is more likely to embed individuals with darker-skinned faces closer to each other, even if they are not the same person, than it would if those images were people with lighter-skinned faces. Second, it will be unfair if the features it learns for measuring similarity are better for the majority group than for the minority group.

The researchers ran a number of experiments on models with unfair similarity metrics and were unable to overcome the bias the model had learned in its embedding space.

“This is quite scary because it is a very common practice for companies to release these embedding models and then people finetune them for some downstream classification task. But no matter what you do downstream, you simply can’t fix the fairness problems that were induced in the embedding space,” Dullerud says.

Even if a user retrains the model on a balanced dataset for the downstream task, which is the best-case scenario for fixing the fairness problem, there are still performance gaps of at least 20 percent, she says.

The only way to solve this problem is to ensure the embedding space is fair to begin with.

Learning separate metrics

The researchers’ solution, called Partial Attribute Decorrelation (PARADE), involves training the model to learn a separate similarity metric for a sensitive attribute, like skin tone, and then decorrelating the skin tone similarity metric from the targeted similarity metric. If the model is learning the similarity metrics of different human faces, it will learn to map similar faces close together and dissimilar faces far apart using features other than skin tone.

Any number of sensitive attributes can be decorrelated from the targeted similarity metric in this way. And because the similarity metric for the sensitive attribute is learned in a separate embedding space, it is discarded after training so only the targeted similarity metric remains in the model.

Their method is applicable to many situations because the user can control the amount of decorrelation between similarity metrics. For instance, if the model will be diagnosing breast cancer from mammogram images, a clinician likely wants some information about biological sex to remain in the final embedding space because it is much more likely that women will have breast cancer than men, Dullerud explains.

They tested their method on two tasks, facial recognition and classifying bird species, and found that it reduced performance gaps caused by bias, both in the embedding space and in the downstream task, regardless of the dataset they used.

Moving forward, Dullerud is interested in studying how to force a deep metric learning model to learn good features in the first place.

“How do you properly audit fairness? That is an open question right now. How can you tell that a model is going to be fair, or that it is only going to be fair in certain situations, and what are those situations? Those are questions I am really interested in moving forward,” she says.

Read More

Using artificial intelligence to find anomalies hiding in massive datasets

Identifying a malfunction in the nation’s power grid can be like trying to find a needle in an enormous haystack. Hundreds of thousands of interrelated sensors spread across the U.S. capture data on electric current, voltage, and other critical information in real time, often taking multiple recordings per second.

Researchers at the MIT-IBM Watson AI Lab have devised a computationally efficient method that can automatically pinpoint anomalies in those data streams in real time. They demonstrated that their artificial intelligence method, which learns to model the interconnectedness of the power grid, is much better at detecting these glitches than some other popular techniques.

Because the machine-learning model they developed does not require annotated data on power grid anomalies for training, it would be easier to apply in real-world situations where high-quality, labeled datasets are often hard to come by. The model is also flexible and can be applied to other situations where a vast number of interconnected sensors collect and report data, like traffic monitoring systems. It could, for example, identify traffic bottlenecks or reveal how traffic jams cascade.

“In the case of a power grid, people have tried to capture the data using statistics and then define detection rules with domain knowledge to say that, for example, if the voltage surges by a certain percentage, then the grid operator should be alerted. Such rule-based systems, even empowered by statistical data analysis, require a lot of labor and expertise. We show that we can automate this process and also learn patterns from the data using advanced machine-learning techniques,” says senior author Jie Chen, a research staff member and manager of the MIT-IBM Watson AI Lab.

The co-author is Enyan Dai, an MIT-IBM Watson AI Lab intern and graduate student at the Pennsylvania State University. This research will be presented at the International Conference on Learning Representations.

Probing probabilities

The researchers began by defining an anomaly as an event that has a low probability of occurring, like a sudden spike in voltage. They treat the power grid data as a probability distribution, so if they can estimate the probability densities, they can identify the low-density values in the dataset. Those data points which are least likely to occur correspond to anomalies.

Estimating those probabilities is no easy task, especially since each sample captures multiple time series, and each time series is a set of multidimensional data points recorded over time. Plus, the sensors that capture all that data are conditional on one another, meaning they are connected in a certain configuration and one sensor can sometimes impact others.

To learn the complex conditional probability distribution of the data, the researchers used a special type of deep-learning model called a normalizing flow, which is particularly effective at estimating the probability density of a sample.

They augmented that normalizing flow model using a type of graph, known as a Bayesian network, which can learn the complex, causal relationship structure between different sensors. This graph structure enables the researchers to see patterns in the data and estimate anomalies more accurately, Chen explains.

“The sensors are interacting with each other, and they have causal relationships and depend on each other. So, we have to be able to inject this dependency information into the way that we compute the probabilities,” he says.

This Bayesian network factorizes, or breaks down, the joint probability of the multiple time series data into less complex, conditional probabilities that are much easier to parameterize, learn, and evaluate. This allows the researchers to estimate the likelihood of observing certain sensor readings, and to identify those readings that have a low probability of occurring, meaning they are anomalies.

Their method is especially powerful because this complex graph structure does not need to be defined in advance — the model can learn the graph on its own, in an unsupervised manner.

A powerful technique

They tested this framework by seeing how well it could identify anomalies in power grid data, traffic data, and water system data. The datasets they used for testing contained anomalies that had been identified by humans, so the researchers were able to compare the anomalies their model identified with real glitches in each system.

Their model outperformed all the baselines by detecting a higher percentage of true anomalies in each dataset.

“For the baselines, a lot of them don’t incorporate graph structure. That perfectly corroborates our hypothesis. Figuring out the dependency relationships between the different nodes in the graph is definitely helping us,” Chen says.

Their methodology is also flexible. Armed with a large, unlabeled dataset, they can tune the model to make effective anomaly predictions in other situations, like traffic patterns.

Once the model is deployed, it would continue to learn from a steady stream of new sensor data, adapting to possible drift of the data distribution and maintaining accuracy over time, says Chen.

Though this particular project is close to its end, he looks forward to applying the lessons he learned to other areas of deep-learning research, particularly on graphs.

Chen and his colleagues could use this approach to develop models that map other complex, conditional relationships. They also want to explore how they can efficiently learn these models when the graphs become enormous, perhaps with millions or billions of interconnected nodes. And rather than finding anomalies, they could also use this approach to improve the accuracy of forecasts based on datasets or streamline other classification techniques.

This work was funded by the MIT-IBM Watson AI Lab and the U.S. Department of Energy.

Read More

Deep-learning technique predicts clinical treatment outcomes

When it comes to treatment strategies for critically ill patients, clinicians want to be able to consider all their options and timing of administration, and make the optimal decision for their patients. While clinician experience and study has helped them to be successful in this effort, not all patients are the same, and treatment decisions at this crucial time could mean the difference between patient improvement and quick deterioration. Therefore, it would be helpful for doctors to be able to take a patient’s previous known health status and received treatments and use that to predict that patient’s health outcome under different treatment scenarios, in order to pick the best path.

Now, a deep-learning technique, called G-Net, from researchers at MIT and IBM provides a window into causal counterfactual prediction, affording physicians the opportunity to explore how a patient might fare under different treatment plans. The foundation of G-Net is the g-computation algorithm, a causal inference method that estimates the effect of dynamic exposures in the presence of measured confounding variables — ones that may influence both treatments and outcomes. Unlike previous implementations of the g-computation framework, which have used linear modeling approaches, G-Net uses recurrent neural networks (RNN), which have node connections that allow them to better model temporal sequences with complex and nonlinear dynamics, like those found in the physiological and clinical time series data. In this way, physicians can develop alternative plans based on patient history and test them before making a decision.

“Our ultimate goal is to develop a machine learning technique that would allow doctors to explore various ‘What if’ scenarios and treatment options,” says Li-wei Lehman, MIT research scientist in the MIT Institute for Medical Engineering and Science and an MIT-IBM Watson AI Lab project lead. “A lot of work has been done in terms of deep learning for counterfactual prediction but [it’s] been focusing on a point exposure setting,” or a static, time-varying treatment strategy, which doesn’t allow for adjustment of treatments as patient history changes. However, her team’s new prediction approach provides for treatment plan flexibility and chances for treatment alteration over time as patient covariate history and past treatments change. “G-Net is the first deep-learning approach based on g-computation that can predict both the population-level and individual-level treatment effects under dynamic and time varying treatment strategies.”

The research, which was recently published in the Proceedings of Machine Learning Research, was co-authored by Rui Li MEng ’20, Stephanie Hu MEng ’21, former MIT postdoc Mingyu Lu MD, graduate student Yuria Utsumi, IBM research staff member Prithwish Chakraborty, IBM Research director of Hybrid Cloud Services Daby Sow, IBM data scientist Piyush Madan, IBM research scientist Mohamed Ghalwash, and IBM research scientist Zach Shahn.

Tracking disease progression

To build, validate, and test G-Net’s predictive abilities, the researchers considered the circulatory system in septic patients in the ICU. During critical care, doctors need to make trade-offs and judgement calls, such as ensuring the organs are receiving adequate blood supply without overworking the heart. For this, they could give intravenous fluids to patients to increase blood pressure; however, too much can cause edema. Alternatively, physicians can administer vasopressors, which act to contract blood vessels and raise blood pressure.

In order to mimic this and demonstrate G-Net’s proof-of-concept, the team used CVSim, a mechanistic model of a human cardiovascular system that’s governed by 28 input variables characterizing the system’s current state, such as arterial pressure, central venous pressure, total blood volume, and total peripheral resistance, and modified it to simulate various disease processes (e.g., sepsis or blood loss) and effects of interventions (e.g., fluids and vasopressors). The researchers used CVSim to generate observational patient data for training and for “ground truth” comparison against counterfactual prediction. In their G-Net architecture, the researchers ran two RNNs to handle and predict variables that are continuous, meaning they can take on a range of values, like blood pressure, and categorical variables, which have discrete values, like the presence or absence of pulmonary edema. The researchers simulated the health trajectories of thousands of “patients” exhibiting symptoms under one treatment regime, let’s say A, for 66 timesteps, and used them to train and validate their model.

Testing G-Net’s prediction capability, the team generated two counterfactual datasets. Each contained roughly 1,000 known patient health trajectories, which were created from CVSim using the same “patient” condition as the starting point under treatment A. Then at timestep 33, treatment changed to plan B or C, depending on the dataset. The team then performed 100 prediction trajectories for each of these 1,000 patients, whose treatment and medical history was known up until timestep 33 when a new treatment was administered. In these cases, the prediction agreed well with the “ground-truth” observations for individual patients and averaged population-level trajectories.

A cut above the rest

Since the g-computation framework is flexible, the researchers wanted to examine G-Net’s prediction using different nonlinear models — in this case, long short-term memory (LSTM) models, which are a type of RNN that can learn from previous data patterns or sequences — against the more classical linear models and a multilayer perception model (MLP), a type of neural network that can make predictions using a nonlinear approach. Following a similar setup as before, the team found that the error between the known and predicted cases was smallest in the LSTM models compared to the others. Since G-Net is able to model the temporal patterns of the patient’s ICU history and past treatment, whereas a linear model and MLP cannot, it was better able to predict the patient’s outcome.

The team also compared G-Net’s prediction in a static, time-varying treatment setting against two state-of-the-art deep-learning based counterfactual prediction approaches, a recurrent marginal structural network (rMSN) and a counterfactual recurrent neural network (CRN), as well as a linear model and an MLP. For this, they investigated a model for tumor growth under no treatment, radiation, chemotherapy, and both radiation and chemotherapy scenarios. “Imagine a scenario where there’s a patient with cancer, and an example of a static regime would be if you only give a fixed dosage of chemotherapy, radiation, or any kind of drug, and wait until the end of your trajectory,” comments Lu. For these investigations, the researchers generated simulated observational data using tumor volume as the primary influence dictating treatment plans and demonstrated that G-Net outperformed the other models. One potential reason could be because g-computation is known to be more statistically efficient than rMSN and CRN, when models are correctly specified.

While G-Net has done well with simulated data, more needs to be done before it can be applied to real patients. Since neural networks can be thought of as “black boxes” for prediction results, the researchers are beginning to investigate the uncertainty in the model to help ensure safety. In contrast to these approaches that recommend an “optimal” treatment plan without any clinician involvement, “as a decision support tool, I believe that G-Net would be more interpretable, since the clinicians would input treatment strategies themselves,” says Lehman, and “G-Net will allow them to be able to explore different hypotheses.” Further, the team has moved on to using real data from ICU patients with sepsis, bringing it one step closer to implementation in hospitals.

“I think it is pretty important and exciting for real-world applications,” says Hu. “It’d be helpful to have some way to predict whether or not a treatment might work or what the effects might be — a quicker iteration process for developing these hypotheses for what to try, before actually trying to implement them in in a years-long, potentially very involved and very invasive type of clinical trial.”

This research was funded by the MIT-IBM Watson AI Lab.

Read More

More sensitive X-ray imaging

Scintillators are materials that emit light when bombarded with high-energy particles or X-rays. In medical or dental X-ray systems, they convert incoming X-ray radiation into visible light that can then be captured using film or photosensors. They’re also used for night-vision systems and for research, such as in particle detectors or electron microscopes.

Researchers at MIT have now shown how one could improve the efficiency of scintillators by at least tenfold, and perhaps even a hundredfold, by changing the material’s surface to create certain nanoscale configurations, such as arrays of wave-like ridges. While past attempts to develop more efficient scintillators have focused on finding new materials, the new approach could in principle work with any of the existing materials.

Though it will require more time and effort to integrate their scintillators into existing X-ray machines, the team believes that this method might lead to improvements in medical diagnostic X-rays or CT scans, to reduce dose exposure and improve image quality. In other applications, such as X-ray inspection of manufactured parts for quality control, the new scintillators could enable inspections with higher accuracy or at faster speeds.

The findings are described today in the journal Science, in a paper by MIT doctoral students Charles Roques-Carmes and Nicholas Rivera; MIT professors Marin Soljacic, Steven Johnson, and John Joannopoulos; and 10 others.

While scintillators have been in use for some 70 years, much of the research in the field has focused on developing new materials that produce brighter or faster light emissions. The new approach instead applies advances in nanotechnology to existing materials. By creating patterns in scintillator materials at a length scale comparable to the wavelengths of the light being emitted, the team found that it was possible to dramatically change the material’s optical properties.

To make what they coined “nanophotonic scintillators,” Roques-Carmes says, “you can directly make patterns inside the scintillators, or you can glue on another material that would have holes on the nanoscale. The specifics depend on the exact structure and material.” For this research, the team took a scintillator and made holes spaced apart by roughly one optical wavelength, or about 500 nanometers (billionths of a meter).

“The key to what we’re doing is a general theory and framework we have developed,” Rivera says. This allows the researchers to calculate the scintillation levels that would be produced by any arbitrary configuration of nanophotonic structures. The scintillation process itself involves a series of steps, making it complicated to unravel. The framework the team developed involves integrating three different types of physics, Roques-Carmes says. Using this system they have found a good match between their predictions and the results of their subsequent experiments.

The experiments showed a tenfold improvement in emission from the treated scintillator. “So, this is something that might translate into applications for medical imaging, which are optical photon-starved, meaning the conversion of X-rays to optical light limits the image quality. [In medical imaging,] you do not want to irradiate your patients with too much of the X-rays, especially for routine screening, and especially for young patients as well,” Roques-Carmes says.

“We believe that this will open a new field of research in nanophotonics,” he adds. “You can use a lot of the existing work and research that has been done in the field of nanophotonics to improve significantly on existing materials that scintillate.”

“The research presented in this paper is hugely significant,” says Rajiv Gupta, chief of neuroradiology at Massachusetts General Hospital and an associate professor at Harvard Medical School, who was not associated with this work. “Nearly all detectors used in the $100 billion [medical X-ray] industry are indirect detectors,” which is the type of detector the new findings apply to, he says. “Everything that I use in my clinical practice today is based on this principle. This paper improves the efficiency of this process by 10 times. If this claim is even partially true, say the improvement is two times instead of 10 times, it would be transformative for the field!”

Soljacic says that while their experiments proved a tenfold improvement in emission could be achieved in particular systems, by further fine-tuning the design of the nanoscale patterning, “we also show that you can get up to 100 times [improvement] in certain scintillator systems, and we believe we also have a path toward making it even better,” he says.

Soljacic points out that in other areas of nanophotonics, a field that deals with how light interacts with materials that are structured at the nanometer scale, the development of computational simulations has enabled rapid, substantial improvements, for example in the development of solar cells and LEDs. The new models this team developed for scintillating materials could facilitate similar leaps in this technology, he says.

Nanophotonics techniques “give you the ultimate power of tailoring and enhancing the behavior of light,” Soljacic says. “But until now, this promise, this ability to do this with scintillation was unreachable because modeling the scintillation was very challenging. Now, this work for the first time opens up this field of scintillation, fully opens it, for the application of nanophotonics techniques.” More generally, the team believes that the combination of nanophotonic and scintillators might ultimately enable higher resolution, reduced X-ray dose, and energy-resolved X-ray imaging.

This work is “very original and excellent,” says Eli Yablonovitch, a professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley, who was not associated with this research. “New scintillator concepts are very important in medical imaging and in basic research.”

Yablonovitch adds that while the concept still needs to be proven in a practical device, he says that, “After years of research on photonic crystals in optical communication and other fields, it’s long overdue that photonic crystals should be applied to scintillators, which are of great practical importance yet have been overlooked” until this work.

The research team included Ali Ghorashi, Steven Kooi, Yi Yang, Zin Lin, Justin Beroz, Aviram Massuda, Jamison Sloan, and Nicolas Romeo at MIT; Yang Yu at Raith America, Inc.; and Ido Kaminer at Technion in Israel. The work was supported, in part, by the U.S. Army Research Office and the U.S. Army Research Laboratory through the Institute for Soldier Nanotechnologies, by the Air Force Office of Scientific Research, and by a Mathworks Engineering Fellowship.

Read More

Can machine-learning models overcome biased datasets?

Artificial intelligence systems may be able to complete tasks quickly, but that doesn’t mean they always do so fairly. If the datasets used to train machine-learning models contain biased data, it is likely the system could exhibit that same bias when it makes decisions in practice.

For instance, if a dataset contains mostly images of white men, then a facial-recognition model trained with these data may be less accurate for women or people with different skin tones.

A group of researchers at MIT, in collaboration with researchers at Harvard University and Fujitsu Ltd., sought to understand when and how a machine-learning model is capable of overcoming this kind of dataset bias. They used an approach from neuroscience to study how training data affects whether an artificial neural network can learn to recognize objects it has not seen before. A neural network is a machine-learning model that mimics the human brain in the way it contains layers of interconnected nodes, or “neurons,” that process data.

The new results show that diversity in training data has a major influence on whether a neural network is able to overcome bias, but at the same time dataset diversity can degrade the network’s performance. They also show that how a neural network is trained, and the specific types of neurons that emerge during the training process, can play a major role in whether it is able to overcome a biased dataset.

“A neural network can overcome dataset bias, which is encouraging. But the main takeaway here is that we need to take into account data diversity. We need to stop thinking that if you just collect a ton of raw data, that is going to get you somewhere. We need to be very careful about how we design datasets in the first place,” says Xavier Boix, a research scientist in the Department of Brain and Cognitive Sciences (BCS) and the Center for Brains, Minds, and Machines (CBMM), and senior author of the paper.  

Co-authors include former MIT graduate students Timothy Henry, Jamell Dozier, Helen Ho, Nishchal Bhandari, and Spandan Madan, a corresponding author who is currently pursuing a PhD at Harvard; Tomotake Sasaki, a former visiting scientist now a senior researcher at Fujitsu Research; Frédo Durand, a professor of electrical engineering and computer science at MIT and a member of the Computer Science and Artificial Intelligence Laboratory; and Hanspeter Pfister, the An Wang Professor of Computer Science at the Harvard School of Enginering and Applied Sciences. The research appears today in Nature Machine Intelligence.

Thinking like a neuroscientist

Boix and his colleagues approached the problem of dataset bias by thinking like neuroscientists. In neuroscience, Boix explains, it is common to use controlled datasets in experiments, meaning a dataset in which the researchers know as much as possible about the information it contains.

The team built datasets that contained images of different objects in varied poses, and carefully controlled the combinations so some datasets had more diversity than others. In this case, a dataset had less diversity if it contains more images that show objects from only one viewpoint. A more diverse dataset had more images showing objects from multiple viewpoints. Each dataset contained the same number of images.

The researchers used these carefully constructed datasets to train a neural network for image classification, and then studied how well it was able to identify objects from viewpoints the network did not see during training (known as an out-of-distribution combination). 

For example, if researchers are training a model to classify cars in images, they want the model to learn what different cars look like. But if every Ford Thunderbird in the training dataset is shown from the front, when the trained model is given an image of a Ford Thunderbird shot from the side, it may misclassify it, even if it was trained on millions of car photos.

The researchers found that if the dataset is more diverse — if more images show objects from different viewpoints — the network is better able to generalize to new images or viewpoints. Data diversity is key to overcoming bias, Boix says.

“But it is not like more data diversity is always better; there is a tension here. When the neural network gets better at recognizing new things it hasn’t seen, then it will become harder for it to recognize things it has already seen,” he says.

Testing training methods

The researchers also studied methods for training the neural network.

In machine learning, it is common to train a network to perform multiple tasks at the same time. The idea is that if a relationship exists between the tasks, the network will learn to perform each one better if it learns them together.

But the researchers found the opposite to be true — a model trained separately for each task was able to overcome bias far better than a model trained for both tasks together.

“The results were really striking. In fact, the first time we did this experiment, we thought it was a bug. It took us several weeks to realize it was a real result because it was so unexpected,” he says.

They dove deeper inside the neural networks to understand why this occurs.

They found that neuron specialization seems to play a major role. When the neural network is trained to recognize objects in images, it appears that two types of neurons emerge — one that specializes in recognizing the object category and another that specializes in recognizing the viewpoint.

When the network is trained to perform tasks separately, those specialized neurons are more prominent, Boix explains. But if a network is trained to do both tasks simultaneously, some neurons become diluted and don’t specialize for one task. These unspecialized neurons are more likely to get confused, he says.

“But the next question now is, how did these neurons get there? You train the neural network and they emerge from the learning process. No one told the network to include these types of neurons in its architecture. That is the fascinating thing,” he says.

That is one area the researchers hope to explore with future work. They want to see if they can force a neural network to develop neurons with this specialization. They also want to apply their approach to more complex tasks, such as objects with complicated textures or varied illuminations.

Boix is encouraged that a neural network can learn to overcome bias, and he is hopeful their work can inspire others to be more thoughtful about the datasets they are using in AI applications.

This work was supported, in part, by the National Science Foundation, a Google Faculty Research Award, the Toyota Research Institute, the Center for Brains, Minds, and Machines, Fujitsu Research, and the MIT-Sensetime Alliance on Artificial Intelligence.

Read More

Toward a stronger defense of personal data

A heart attack patient, recently discharged from the hospital, is using a smartwatch to help monitor his electrocardiogram signals. The smartwatch may seem secure, but the neural network processing that health information is using private data that could still be stolen by a malicious agent through a side-channel attack.

A side-channel attack seeks to gather secret information by indirectly exploiting a system or its hardware. In one type of side-channel attack, a savvy hacker could monitor fluctuations in the device’s power consumption while the neural network is operating to extract protected information that “leaks” out of the device.

“In the movies, when people want to open locked safes, they listen to the clicks of the lock as they turn it. That reveals that probably turning the lock in this direction will help them proceed further. That is what a side-channel attack is. It is just exploiting unintended information and using it to predict what is going on inside the device,” says Saurav Maji, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS) and lead author of a paper that tackles this issue.

Current methods that can prevent some side-channel attacks are notoriously power-intensive, so they often aren’t feasible for internet-of-things (IoT) devices like smartwatches, which rely on lower-power computation.

Now, Maji and his collaborators have built an integrated circuit chip that can defend against power side-channel attacks while using much less energy than a common security technique. The chip, smaller than a thumbnail, could be incorporated into a smartwatch, smartphone, or tablet to perform secure machine learning computations on sensor values.

“The goal of this project is to build an integrated circuit that does machine learning on the edge, so that it is still low-power but can protect against these side channel attacks so we don’t lose the privacy of these models,” says Anantha Chandrakasan, the dean of the MIT School of Engineering, Vannevar Bush Professor of Electrical Engineering and Computer Science, and senior author of the paper. “People have not paid much attention to security of these machine-learning algorithms, and this proposed hardware is effectively addressing this space.”

Co-authors include Utsav Banerjee, a former EECS graduate student who is now an assistant professor in the Department of Electronic Systems Engineering at the Indian Institute of Science, and Samuel Fuller, an MIT visiting scientist and distinguished research scientist at Analog Devices. The research is being presented at the International Solid-States Circuit Conference.

Computing at random

The chip the team developed is based on a special type of computation known as threshold computing. Rather than having a neural network operate on actual data, the data are first split into unique, random components. The network operates on those random components individually, in a random order, before accumulating the final result.

Using this method, the information leakage from the device is random every time, so it does not reveal any actual side-channel information, Maji says. But this approach is more computationally expensive since the neural network now must run more operations, and it also requires more memory to store the jumbled information.

So, the researchers optimized the process by using a function that reduces the amount of multiplication the neural network needs to process data, which slashes the required computing power. They also protect the neutral network itself by encrypting the model’s parameters. By grouping the parameters in chunks before encrypting them, they provide more security while reducing the amount of memory needed on the chip.

“By using this special function, we can perform this operation while skipping some steps with lesser impacts, which allows us to reduce the overhead. We can reduce the cost, but it comes with other costs in terms of neural network accuracy. So, we have to make a judicious choice of the algorithm and architectures that we choose,” Maji says.

Existing secure computation methods like homomorphic encryption offer strong security guarantees, but they incur huge overheads in area and power, which limits their use in many applications. The researchers’ proposed method, which aims to provide the same type of security, was able to achieve three orders of magnitude lower energy use. By streamlining the chip architecture, the researchers were also able to use less space on a silicon chip than similar security hardware, an important factor when implementing a chip on personal-sized devices.

“Security matters”

While providing significant security against power side-channel attacks, the researchers’ chip requires 5.5 times more power and 1.6 times more silicon area than a baseline insecure implementation.

“We’re at the point where security matters. We have to be willing to trade off some amount of energy consumption to make a more secure computation. This is not a free lunch. Future research could focus on how to reduce the amount of overhead in order to make this computation more secure,” Chandrakasan says.

They compared their chip to a default implementation which had no security hardware. In the default implementation, they were able to recover hidden information after collecting about 1,000 power waveforms (representations of power usage over time) from the device. With the new hardware, even after collecting 2 million waveforms, they still could not recover the data.

They also tested their chip with biomedical signal data to ensure it would work in a real-world implementation. The chip is flexible and can be programmed to any signal a user wants to analyze, Maji explains.

“Security adds a new dimension to the design of IoT nodes, on top of designing for performance, power, and energy consumption. This ASIC [application-specific integrated circuit] nicely demonstrates that designing for security, in this case by adding a masking scheme, does not need to be seen as an expensive add-on,” says Ingrid Verbauwhede, a professor in the computer security and industrial cryptography research group of the electrical engineering department at the Catholic University of Leuven, who was not involved with this research. “The authors show that by selecting masking friendly computational units, integrating security during design, even including the randomness generator, a secure neural network accelerator is feasible in the context of an IoT,” she adds.

In the future, the researchers hope to apply their approach to electromagnetic side-channel attacks. These attacks are harder to defend, since a hacker does not need the physical device to collect hidden information.

This work was funded by Analog Devices, Inc. Chip fabrication support was provided by the Taiwan Semiconductor Manufacturing Company University Shuttle Program.

Read More

Research advances technology of AI assistance for anesthesiologists

A new study by researchers at MIT and Massachusetts General Hospital (MGH) suggests the day may be approaching when advanced artificial intelligence systems could assist anesthesiologists in the operating room.

In a special edition of Artificial Intelligence in Medicine, the team of neuroscientists, engineers, and physicians demonstrated a machine learning algorithm for continuously automating dosing of the anesthetic drug propofol. Using an application of deep reinforcement learning, in which the software’s neural networks simultaneously learned how its dosing choices maintain unconsciousness and how to critique the efficacy of its own actions, the algorithm outperformed more traditional software in sophisticated, physiology-based simulations of patients. It also closely matched the performance of real anesthesiologists when showing what it would do to maintain unconsciousness given recorded data from nine real surgeries.

The algorithm’s advances increase the feasibility for computers to maintain patient unconsciousness with no more drug than is needed, thereby freeing up anesthesiologists for all the other responsibilities they have in the operating room, including making sure patients remain immobile, experience no pain, remain physiologically stable, and receive adequate oxygen, say co-lead authors Gabe Schamberg and Marcus Badgeley.

“One can think of our goal as being analogous to an airplane’s autopilot, where the captain is always in the cockpit paying attention,” says Schamberg, a former MIT postdoc who is also the study’s corresponding author. “Anesthesiologists have to simultaneously monitor numerous aspects of a patient’s physiological state, and so it makes sense to automate those aspects of patient care that we understand well.”

Senior author Emery N. Brown, a neuroscientist at The Picower Institute for Learning and Memory and Institute for Medical Engineering and Science at MIT and an anesthesiologist at MGH, says the algorithm’s potential to help optimize drug dosing could improve patient care.

“Algorithms such as this one allow anesthesiologists to maintain more careful, near-continuous vigilance over the patient during general anesthesia,” says Brown, the Edward Hood Taplin Professor Computational Neuroscience and Health Sciences and Technology at MIT.

Both actor and critic

The research team designed a machine learning approach that would not only learn how to dose propofol to maintain patient unconsciousness, but also how to do so in a way that would optimize the amount of drug administered. They accomplished this by endowing the software with two related neural networks: an “actor” with the responsibility to decide how much drug to dose at every given moment, and a “critic” whose job was to help the actor behave in a manner that maximizes “rewards” specified by the programmer. For instance, the researchers experimented with training the algorithm using three different rewards: one that penalized only overdosing, one that questioned providing any dose, and one that imposed no penalties.

In every case, they trained the algorithm with simulations of patients that employed advanced models of both pharmacokinetics, or how quickly propofol doses reach the relevant regions of the brain after doses are administered, and pharmacodynamics, or how the drug actually alters consciousness when it reaches its destination. Patient unconsciousness levels, meanwhile, were reflected in measure of brain waves, as they can be in real operating rooms. By running hundreds of rounds of simulation with a range of values for these conditions, both the actor and the critic could learn how to perform their roles for a variety of kinds of patients.

The most effective reward system turned out to be the “dose penalty” one in which the critic questioned every dose the actor gave, constantly chiding the actor to keep dosing to a necessary minimum to maintain unconsciousness. Without any dosing penalty the system sometimes dosed too much, and with only an overdose penalty it sometimes gave too little. The “dose penalty” model learned more quickly and produced less error than the other value models and the traditional standard software, a “proportional integral derivative” controller.

An able advisor

After training and testing the algorithm with simulations, Schamberg and Badgeley put the “dose penalty” version to a more real-world test by feeding it patient consciousness data recorded from real cases in the operating room.  The testing demonstrated both the strengths and limits of the algorithm.

During most tests, the algorithm’s dosing choices closely matched those of the attending anesthesiologists after unconsciousness had been induced and before it was no longer necessary. The algorithm, however, adjusted dosing as frequently as every five seconds, while the anesthesiologists (who all had plenty of other things to do) typically did so only every 20-30 minutes, Badgeley notes.

As the tests showed, the algorithm is not optimized for inducing unconsciousness in the first place, the researchers acknowledge. The software also doesn’t know of its own accord when surgery is over, they add, but it’s a straightforward matter for the anesthesiologist to manage that process.

One of the most important challenges any AI system is likely to continue to face, Schamberg says, is whether the data it is being fed about patient unconsciousness is perfectly accurate. Another active area of research in the Brown lab at MIT and MGH is in improving the interpretation of data sources, such as brain wave signals, to improve the quality of patient monitoring data under anesthesia.

In addition to Schamberg, Badgeley, and Brown, the paper’s other authors are Benyamin Meschede-Krasa and Ohyoon Kwon.

The JPB Foundation and the National Insititutes of Health funded the study.

Read More

An explorer in the sprawling universe of possible chemical combinations

The direct conversion of methane gas to liquid methanol at the site where it is extracted from the Earth holds enormous potential for addressing a number of significant environmental problems. Developing a catalyst for that conversion has been a critical focus for Associate Professor Heather Kulik and the lab she directs at MIT.

As important as that research is, however, it is just one example of the innumerable possibilities of Kulik’s work. Ultimately, her focus is far broader, the scope of her exploration infinitely more vast.

“All of our research is dedicated toward the same practical goal,” she says. “Namely, we aim to be able to predict and understand using computational tools why catalysts or materials behave the way they do so that we can overcome limitations in present understanding or existing materials.”

Simply put, Kulik wants to apply novel simulation and machine-learning technologies she and her lab have developed to rapidly investigate the sprawling world of possible chemical combinations. In the process, the team is mapping out how chemical structures relate to chemical properties, in order to create new materials tailored to particular applications.

“Once you realize the sheer scale of how many materials we could or should be studying to solve outstanding problems, you realize the only way to make a dent is to do things at a larger and faster scale that has ever been done before,” Kulik says. “Thanks to both machine-learning models and heterogeneous computing that has accelerated first-principles modeling, we are now able to start asking and answering questions that we could never have addressed before.”

Despite Kulik’s many awards and consistent recognition for her research, the New Jersey native was not always destined to be a scientist. Her parents were not particularly interested in math and science and, although she was mathematically precocious and did arithmetic as a toddler and college-level classes in middle school, she pursued other interests into her teens, including creative writing, graphic design, art, and photography.

Majoring in chemical engineering at the Cooper Union, Kulik says she wanted to occupy her mind, do something useful, and “make an okay living.” Chemical engineering was one of the highest-paying professions for undergraduates, she says.

The first thing she remembers hearing about graduate school was from a teaching assistant in her undergraduate physics class, who explained that being in academia meant “not having a real job until you’re at least 30” and working long hours.

“I thought that sounded like a terrible idea!” Kulik says.

Luckily, some of her classroom experiences at the Cooper union, as well as encouragement from her quantum mechanics professor, Robert Topper, led her toward research.

“While I wanted to be useful, I kept being drawn to these fundamental questions of how knowing where the atoms and electrons were located explained the world around us,” she says. “Ultimately, I obtained my PhD in computational materials science to become a scientist who works with electrons every day for that reason. Since what I do hardly ever feels like a chore, I now have a greater appreciation for the fact that this path allowed me to ‘not have a real job.’”

Kulik credits MIT professor of chemistry and biology Cathy Drennan, whom Kulik collaborated with during graduate school, with “helping me see past the short-term barriers that come up in academia” and “showing me what a career in science could look like.” She also mentions Nicola Marzari, her PhD advisor, then an associate professor in the MIT’s Department of Materials Science and Engineering, and her postdoc advisor at Stanford University, Todd Martinez, “who gave me a glimpse of what an independent career might look like.”

Kulik works hard to pass on her ethics and her ideas about work-life balance to students in her lab, and she teaches them to rely on each other, referring to the group as a “tight-knit community all with the same goals.” Twice a month, she holds meetings at which she encourages students to share how they have come up with solutions when working through research problems. “We can each see and learn from different problem-solving strategies others in the group have tried and help each other out along the way.”

She also encourages a light atmosphere. The lab’s web page says its members “embrace very #random (but probably fairly uncool) jokes in our Slack channels. We are computational researchers after all!”

“We like to keep it lighthearted,” Kulik says.

Nonetheless, Kulik and her lab have achieved major breakthroughs, including changing the approach to computational chemistry to make the way multiscale simulations are set up more systematic, while exponentially accelerating the process of materials discovery. Over the years, the lab has developed and honed an open-source code called molSimplify, which researchers can use to build and simulate new compounds. Combined with machine-learning models, the automated method enabled by the software has led to “structure-property maps” that explain why materials behave as they do, in a more comprehensive manner than was ever before possible.

For her efforts, Kulik has won grants from the MIT Energy Initiative, a Burroughs Wellcome Fund Career Award at the Scientific Interface, the American Chemical Society OpenEye Outstanding Junior Faculty Award, an Office of Naval Research Young Investigator Award, a DARPA Young Faculty Award and Director’s Fellowship, the AAAS Marion Milligan Mason Award, the Physical Chemistry B Lectureship, and a CAREER award from the National Science Foundation, among others. This year, she was named a Sloan Research Fellow and was granted tenure.

When not hard at work on her next accomplishment, Kulik enjoys listening to music and taking walks around Cambridge and Boston, where she lives in the Beacon Hill neighborhood with her partner, who was a fellow graduate student at MIT.

Each year for the past three to four years, Kulik has spent at least two weeks on a wintertime vacation in a sunny climate.

“I reflect on what I’ve been doing at work as well as what my priorities might be both in life and in work in the upcoming year,” she says. “This helps to inform any decisions I make about how to prioritize my time and efforts each year and helps me to make sure I’ve put everything in perspective.”

Read More

The downside of machine learning in health care

While working toward her dissertation in computer science at MIT, Marzyeh Ghassemi wrote several papers on how machine-learning techniques from artificial intelligence could be applied to clinical data in order to predict patient outcomes. “It wasn’t until the end of my PhD work that one of my committee members asked: ‘Did you ever check to see how well your model worked across different groups of people?’”

That question was eye-opening for Ghassemi, who had previously assessed the performance of models in aggregate, across all patients. Upon a closer look, she saw that models often worked differently — specifically worse — for populations including Black women, a revelation that took her by surprise. “I hadn’t made the connection beforehand that health disparities would translate directly to model disparities,” she says. “And given that I am a visible minority woman-identifying computer scientist at MIT, I am reasonably certain that many others weren’t aware of this either.”

In a paper published Jan. 14 in the journal Patterns, Ghassemi — who earned her doctorate in 2017 and is now an assistant professor in the Department of Electrical Engineering and Computer Science and the MIT Institute for Medical Engineering and Science (IMES) — and her coauthor, Elaine Okanyene Nsoesie of Boston University, offer a cautionary note about the prospects for AI in medicine. “If used carefully, this technology could improve performance in health care and potentially reduce inequities,” Ghassemi says. “But if we’re not actually careful, technology could worsen care.”

It all comes down to data, given that the AI tools in question train themselves by processing and analyzing vast quantities of data. But the data they are given are produced by humans, who are fallible and whose judgments may be clouded by the fact that they interact differently with patients depending on their age, gender, and race, without even knowing it.

Furthermore, there is still great uncertainty about medical conditions themselves. “Doctors trained at the same medical school for 10 years can, and often do, disagree about a patient’s diagnosis,” Ghassemi says. That’s different from the applications where existing machine-learning algorithms excel — like object-recognition tasks — because practically everyone in the world will agree that a dog is, in fact, a dog.

Machine-learning algorithms have also fared well in mastering games like chess and Go, where both the rules and the “win conditions” are clearly defined. Physicians, however, don’t always concur on the rules for treating patients, and even the win condition of being “healthy” is not widely agreed upon. “Doctors know what it means to be sick,” Ghassemi explains, “and we have the most data for people when they are sickest. But we don’t get much data from people when they are healthy because they’re less likely to see doctors then.”

Even mechanical devices can contribute to flawed data and disparities in treatment. Pulse oximeters, for example, which have been calibrated predominately on light-skinned individuals, do not accurately measure blood oxygen levels for people with darker skin. And these deficiencies are most acute when oxygen levels are low — precisely when accurate readings are most urgent. Similarly, women face increased risks during “metal-on-metal” hip replacements, Ghassemi and Nsoesie write, “due in part to anatomic differences that aren’t taken into account in implant design.” Facts like these could be buried within the data fed to computer models whose output will be undermined as a result.

Coming from computers, the product of machine-learning algorithms offers “the sheen of objectivity,” according to Ghassemi. But that can be deceptive and dangerous, because it’s harder to ferret out the faulty data supplied en masse to a computer than it is to discount the recommendations of a single possibly inept (and maybe even racist) doctor. “The problem is not machine learning itself,” she insists. “It’s people. Human caregivers generate bad data sometimes because they are not perfect.”

Nevertheless, she still believes that machine learning can offer benefits in health care in terms of more efficient and fairer recommendations and practices. One key to realizing the promise of machine learning in health care is to improve the quality of data, which is no easy task. “Imagine if we could take data from doctors that have the best performance and share that with other doctors that have less training and experience,” Ghassemi says. “We really need to collect this data and audit it.”

The challenge here is that the collection of data is not incentivized or rewarded, she notes. “It’s not easy to get a grant for that, or ask students to spend time on it. And data providers might say, ‘Why should I give my data out for free when I can sell it to a company for millions?’ But researchers should be able to access data without having to deal with questions like: ‘What paper will I get my name on in exchange for giving you access to data that sits at my institution?’

“The only way to get better health care is to get better data,” Ghassemi says, “and the only way to get better data is to incentivize its release.”

It’s not only a question of collecting data. There’s also the matter of who will collect it and vet it. Ghassemi recommends assembling diverse groups of researchers — clinicians, statisticians, medical ethicists, and computer scientists — to first gather diverse patient data and then “focus on developing fair and equitable improvements in health care that can be deployed in not just one advanced medical setting, but in a wide range of medical settings.”

The objective of the Patterns paper is not to discourage technologists from bringing their expertise in machine learning to the medical world, she says. “They just need to be cognizant of the gaps that appear in treatment and other complexities that ought to be considered before giving their stamp of approval to a particular computer model.”

Read More

2021-22 Takeda Fellows: Leaning on AI to advance medicine for humans

In fall 2020, MIT’s School of Engineering and Takeda Pharmaceuticals Company Limited launched the MIT-Takeda Program, a collaboration to support members of the MIT community working at the intersection of artificial intelligence and human health. Housed at the Abdul Latif Jameel Clinic for Machine Learning in Health, the collaboration aims to use artificial intelligence to both benefit human health and aid in drug development. Combining technology with cutting-edge health research, the program’s participants hope to improve health outcomes across the world.

Thus far, the partnership has supported joint research efforts focused on topics such as automated inspection in sterile pharmaceutical manufacturing and machine learning for liver phenotyping.

Every year, the program also funds graduate fellowships to support students pursuing research on a broad range of issues tied to health and AI. This year’s Takeda fellows, described below, are working on research involving electronic health record algorithms, remote sensing data as it relates to environmental health risk, and neural networks for the development of antibiotics.

Monica Agrawal

Agrawal is a PhD student in the Department of Electrical Engineering and Computer Science (EECS). Her research focuses on the development of machine learning algorithms that could unlock the potential of electronic health records to power personalized, real-world studies of comparative effectiveness. She is tackling the issue from three interconnected angles: understanding the basic building blocks of clinical text, enabling the structuring of clinical timelines with only minimal labeled data, and redesigning clinical documentation to incentivize high-quality structured data at the time of creation. Agrawal earned both a BS and an MS in computer science from Stanford University.

Peng Cao

A PhD student in EECS, Peng Cao’s research is focused on developing a new approach to monitoring oxygen saturation by analyzing the radio frequency signals that bounce off a person’s body. To this end, she is extracting respiration signals from the radio signals and then training a neural network to infer oxygen levels from it. Peng earned a BS in computer science from Peking University in China.

Bianca Lepe

A PhD student in biological engineering, Bianca Lepe is working to benchmark existing and defining next-generation vaccine candidates for tuberculosis. She is using publicly available data combined with machine learning algorithms to identify the Mtb proteins that are well-suited as subunit vaccine antigens across the diversity of the human leukocyte antigen alleles. Lepe earned a BS in biological engineering and business from Caltech; an MS in systems and synthetic biology from the University of Edinburgh in Scotland; and an MPhil in technology policy from the University of Cambridge in England.

Caroline McCue

Caroline McCue is a PhD student in mechanical engineering who is developing a system that could simplify and speed up the process of cell passaging. More specifically, she is designing and testing a platform that triggers cell detachment in response to simple external stimuli, such as a change in voltage or in mechanical properties. She plans to test the efficacy of this platform by applying machine learning to quantify the adhesion of Chinese hamster ovary cells to these surfaces. McCue earned a BS in mechanical engineering from the University of Maryland.

Somesh Mohapatra

A PhD student in the Department of Materials Science and Engineering, Somesh Mohapatra is also pursuing an MBA at the MIT Sloan School of Management as part of the Leaders for Global Operations Program. His doctoral research, in close collaboration with experimentalists at MIT, focuses on designing biomacromolecules using interpretable machine learning and simulations. Specifically, Mohapatra leverages macromolecule graph representations to develop machine learning models for quantitative prediction, optimization, and attribution methods. He then applies these tools to elucidate design principles and to improve performance and synthetic accessibility of functionality macromolecules, ranging from peptides and glycans to electrolytes and thermosets. Mohapatra earned his BTech in metallurgical and materials engineering from the Indian Institute of Technology Roorkee in India.

Luke Murray

Luke Murray is a PhD student in EECS. He is developing MedKnowts, a system that combines machine learning and human computer interaction techniques to reduce the effort required to synthesize knowledge for medical decision-making, and author high-quality, structured, clinical documentation. MedKnowts unifies these two currently splintered workflows by providing a seamless interface that re-imagines documentation as a natural byproduct of clinical reasoning, rather than as a compliance requirement. Murray earned his BS in computer science from Brown University.

Ufuoma Ovienmhada

Ufuoma Ovienmhada SM ’20 is a PhD student in aeronautics and astronautics. Her research employs a mixed-methods approach (community-centered design, systems engineering, and machine learning) to satellite remote sensing data to create tools that evaluate how human health risk relates to environmental hazards. Ovienmhada earned her BS in mechanical engineering from Stanford University and her SM in media arts and sciences from MIT.​

Lagnajit Pattanaik

Lagnajit “Lucky” Pattanaik is a PhD student in chemical engineering. He seeks to shift the paradigm of predictive organic chemistry from qualitative to quantitative. More specifically, his research is focused on the development of machine learning techniques for predicting 3D structures of molecules and reactions, including transition state geometries and the geometrical conformations that molecules take in solution. He earned a BS in chemical engineering from Ohio State University.

Na Sun 

A PhD student in EECS, Na Sun is working in the emerging field of neuro-immuno-genomics. More specifically, she is developing machine learning methods to better understand the interactions between two extremely complex systems: the human brain and its dozens of cell types, and the human immune system and the dozens of biological processes that it integrates across cognition, pathogen response, diet-exercise-obesity, and synaptic pruning. Sun earned her BS in life sciences from Linyi University in China and an MS in developmental biology from the University of Chinese Academy of Sciences in China.

Jacqueline Valeri

Jacqueline Valeri is a PhD student in biological engineering who utilizes neural networks for antibiotics discovery. Her efforts include the recycling of compounds from existing compound libraries and the computationally assisted design of novel therapeutics. She is also excited by broader applications of machine learning and artificial intelligence in the fields of health care and biomedicine. Valeri earned her BSE and MSE in bioengineering from the University of Pennsylvania.

Clinton Wang

A PhD student in EECS, Clinton Wang SM ’20 has developed a new type of conditional generative adversarial network based on spatial-intensity transforms. It achieves high image fidelity, is robust to artifacts in training data, and generalizes to held-out clinical sites. Wang now aims to extend his model to even more challenging applications, including visualizing transformations of focal pathologies, such as lesions, where it could serve as a powerful tool for characterizing biomarkers of malignancy and treatment response. Wang earned a BS in biomedical engineering from Yale University and an SM in electrical engineering and computer science from MIT.

Read More