Ask a Techspert: What’s a neural network?

Back in the day, there was a surefire way to tell humans and computers apart: You’d present a picture of a four-legged friend and ask if it was a cat or dog. A computer couldn’t identify felines from canines, but we humans could answer with doggone confidence. 

That all changed about a decade ago thanks to leaps in computer vision and machine learning – specifically,  major advancements in neural networks, which can train computers to learn in a way similar to humans. Today, if you give a computer enough images of cats and dogs and label which is which, it can learn to tell them apart purr-fectly. 

But how exactly do neural networks help computers do this? And what else can — or can’t — they do? To answer these questions and more, I sat down with Google Research’s Maithra Raghu, a research scientist who spends her days helping computer scientists better understand neural networks. Her research helped the Google Health team discover new ways to apply deep learning to assist doctors and their patients.

So, the big question: What’s a neural network?

To understand neural networks, we need to first go back to the basics and understand how they fit into the bigger picture of artificial intelligence (AI). Imagine a Russian nesting doll, Maithra explains. AI would be the largest doll, then within that, there’s machine learning (ML), and within that, neural networks (… and within that, deep neural networks, but we’ll get there soon!).

If you think of AI as the science of making things smart, ML is the subfield of AI focused on making computers smarter by teaching them to learn, instead of hard-coding them. Within that, neural networks are an advanced technique for ML, where you teach computers to learn with algorithms that take inspiration from the human brain.

Your brain fires off groups of neurons that communicate with each other. In an artificial neural network, (the computer type), a “neuron” (which you can think of as a computational unit) is grouped with a bunch of other “neurons” into a layer, and those layers  stack on top of each other. Between each of those layers are connections. The more layers  a neural network has, the “deeper” it is. That’s where the idea of “deep learning” comes from. “Neural networks depart from neuroscience because you have a mathematical element to it,” Maithra explains, “Connections between neurons are numerical values represented by matrices, and training the neural network uses gradient-based algorithms.” 

This might seem complex, but you probably interact with neural networks fairly often — like when you’re scrolling through personalized movie recommendations or chatting with a customer service bot.

So once you’ve set up a neural network, is it ready to go?

Not quite. The next step is training. That’s where the model becomes much more sophisticated. Similar to people, neural networks learn from feedback. If you go back to the cat and dog example, your neural network would look at pictures and start by randomly guessing. You’d label the training data (for example, telling the computer if each picture features a cat or dog), and those labels would provide feedback, telling the neural network when it’s right or wrong. Throughout this process, the neural network’s parameters adjust, and the neural network transitions from not knowing to learning how to identify between cats and dogs.

Why don’t we use neural networks all the time?

“Though neural networks are based on our brains, the way they learn is actually very different from humans,” Maithra says. “Neural networks are usually quite specialized and narrow. This can be useful because, for example, it means a neural network might be able to process medical scans much quicker than a doctor, or spot patterns  a trained expert might not even notice.” 

But because neural networks learn differently from people, there’s still a lot that computer scientists don’t know about how they work. Let’s go back to cats versus dogs: If your neural network gives you all the right answers, you might think it’s behaving as intended. But Maithra cautions that neural networks can work in mysterious ways.

“Perhaps your neural network isn’t able to identify between cats and dogs at all – maybe it’s only able to identify between sofas and grass, and all of your pictures of cats happen to be on couches, and all your pictures of dogs are in parks,” she says. “Then, it might seem like it knows the difference when it actually doesn’t.”

That’s why Maithra and other researchers are diving into the internals of neural networks, going deep into their layers and connections, to better understand them – and come up with ways to make them more helpful.

“Neural networks have been transformative for so many industries,” Maithra says, “and I’m excited that we’re going to realize even more profound applications for them moving forward.”

Read More

An update on our progress in responsible AI innovation

Over the past year, responsibly developed AI has transformed health screenings, supportedfact-checking to battle misinformation and save lives, predicted Covid-19 cases to support public health, and protected wildlife after bushfires. Developing AI in a way that gets it right for everyone requires openness, transparency, and a clear focus on understanding the societal implications. That is why we were among the first companies to develop and publish AI Principles and why, each year, we share updates on our progress.

Building on previous AI Principles updates in 20182019, and 2020, today we’re providing our latest overview of what we’ve learned, and how we’re applying these learnings.

Internal Education

In the last year, to ensure our teams have clarity from day one, we’ve added an introduction to our AI Principles for engineers and incoming hires in technical roles. The course presents each of the Principles as well as the applications we will not pursue.

Integrating our Principles into the work we do with enterprise customers is key, so we’ve continued to make our AI Principles in Practice training mandatory for customer-facing Cloud employees. A version of this training is available to all Googlers.

There is no single way to apply the AI Principles to specific features and product development. Training must consider not only the technology and data, but also where and how AI is used. To offer a more comprehensive approach to implementing the AI Principles, we’ve been developing opportunities for Googlers to share their points of view on the responsible development of future technologies, such as the AI Principles Ethics Fellowship for Google’s Employee Resource Groups. Fellows receive AI Principles training and craft hypothetical case studies to inform how Google prioritizes socially beneficial applications. This inaugural year, 27 fellows selected from 191 applicants from around the world wrote and presented case studies on topics such as genome datasets and a Covid-19 content moderation workflow.

Other programs include a bi-weekly Responsible AI Tech Talk Series featuring external experts, such as the Brookings Institution’s Dr. Nicol Turner Lee presenting on detecting and mitigating algorithmic bias.

Tools and Research

To bring together multiple teams working on technical tools and research, this year we formed the Responsible AI and Human-Centered Technology organization. The basic and applied researchers in the organization are devoted to developing technology and best practices for the technical realization of the AI Principles guidance.

As discussed in our December 2020 End-of-Year report, we regularly release these tools to the public. Currently, researchers are developing Know Your Data (in beta) to help developers understand datasets with the goal of improving data quality, helping to mitigate fairness and bias issues.

Image of Know Your Data, a Responsible AI tool in beta

Know Your Data, a Responsible AI tool in beta

Product teams use these tools to evaluate their work’s alignment with the AI Principles. For example, the Portrait Light feature available in both Pixel’s Camera and Google Photos uses multiple machine learning components to instantly add realistic synthetic lighting to portraits. Using computational methods to achieve this effect, however, raised several responsible innovation challenges, including potentially reinforcing unfair bias (AI Principle #2) despite the goal of building a feature that works for all users. So the Portrait Light team generated a training dataset containing millions of photos based on an original set of photos of different people in a diversity of lighting environments, with their explicit consent. The engineering team used various Google Responsible AI tools to test proactively whether the ML models used in Portrait Light performed equitably across different audiences.

Our ongoing technical research related to responsible innovation in AI in the last 12 months has led to more than 200 research papers and articles that address AI Principles-related topics. These include exploring and mitigating data cascades; creating the first model-agnostic framework for partially local federated learning suitable for training and inference at scale; and analyzing the energy- and carbon-costs of training six recent ML models to reduce the carbon footprint of training an ML system by up to 99.9%.

Operationalizing the Principles

To help our teams put the AI Principles into practice, we deploy our decision-making process for reviewing new custom AI projects in development — from chatbots to newer fields such as affective technologies. These reviews are led by a multidisciplinary Responsible Innovation team, which draws on expertise in disciplines including trust and safety, human rights, public policy, law, sustainability, and product management. The team also advises product areas, on specific issues related to serving enterprise customers. Any Googler, at any level, is encouraged to apply for an AI Principles review of a project or planned product or service.

Teams can also request other Responsible Innovation services, such as informal consultations or product fairness testing with the Product Fairness (ProFair) team. ProFair tests products from the user perspective to investigate known issues and find new ones, similar to how an academic researcher would go about identifying fairness issues.

Our Google Cloud, Image Search, Next Billion Users and YouTube product teams have engaged with ProFair to test potential new projects for fairness. Consultations include collaborative testing scenarios, focus groups, and adversarial testing of ML models to address improving equity in data labels, fairness in language models, and bias in predictive text, among other issues. Recently, the ProFair team spent nine months consulting with Google researchers on creating an object recognition dataset for physical landmarks (such as buildings), developing criteria for how to choose which classes to recognize and how to determine the amount of training data per class in ways that would assign a fairer relevance score to each class.

Reviewers weigh the nature of the social benefit involved and whether it substantially exceeds potential challenges. For example, in the past year, reviewers decided not to publicly release ML models that can create photo-realistic synthetic faces made with generative adversarial networks (GANs), because of the risk of potential misuse by malicious actors to create “deepfakes” for misinformation purposes.

As another example, a research team requested a review of a new ML training dataset for computer vision fairness techniques that offered more specific attributes about people, such as perceived gender and age-range. The team worked with Open Images, an open data project containing ~9 million images spanning thousands of object categories and bounding box annotations for 600 classes. Reviewers weighed the risk of labeling the data with the sensitive labels of perceived gender presentation and age-range, and the societal benefit of these labels for fairness analysis and bias mitigation. Given these risks, reviewers required creation of a data card explaining the details and limitations of the project. We released the MIAP (More Inclusive Annotations for People) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy in 100K images containing people. Each annotation is also labeled with fairness-related attributes. MIAP was accepted and presented at the 2021 Artificial Intelligence, Ethics and Society conference.

External Engagement

We remain committed to contributing to emerging international principles for responsible AI innovation. For example, we submitted a response to the European Commission consultation on the inception impact assessment on ethical and legal requirements for AI and feedback on NITI Aayog’s working document on Responsible Use of AI to guide national principles for India. We also supported the Singaporean government’s Guide to Job Redesign in the Age of AI, which outlines opportunities to optimize AI benefits by redesigning jobs and reskilling employees.

Our efforts to engage global audiences, from students to policymakers, center on educational programs and discussions to offer helpful and practical ML education, including:

  • A workshop on Federated Learning and Analytics, making all research talks and a TensorFlow Federated tutorial publicly available.
  • Machine Learning for Policy Leaders (ML4PL), a 2-hour virtual workshop on the basics of ML. To date, we’ve expanded this globally, reaching more than 350 policymakers across the EU, LatAm, APAC, and US.
  • A workshop co-hosted with the Indian Ministry of Electronics and Information Technology on the Responsible Use of AI for Social Empowerment, exploring the potential of AI adoption in the government to address COVID-19, agricultural and environmental crises.

To support these workshops and events with actionable and equitable programming designed for long-term collaboration, over the past year we’ve helped launch:

  • AI for Social Good workshops, bringing together NGOs applying AI to tough challenges in their communities with academic experts and Google researchers to encourage collaborative AI solutions. So far we’ve supported more than 30 projects in Asia Pacific and Sub-Saharan Africa with expertise, funding and Cloud Credits.
  • Two collaborations with the U.S. National Science Foundation: one to support the National AI Research Institute for Human-AI Interaction and Collaboration with $5 million in funding, along with AI expertise, research collaborations and Cloud support; another to join other industry partners and federal agencies as part of a combined $40 million investment in academic research for Resilient and Intelligent Next-Generation (NextG) Systems, in which Google will offer expertise, research collaborations, infrastructure and in-kind support to researchers.
  • Quarterly Equitable AI Research Roundtables (EARR), focused on the potential downstream harms of AI with experts from the Othering and Belonging Institute at UC Berkeley, PolicyLink, and Emory University School of Law.
  • MLCommons, a non-profit, which will administer MLPerf, a suite of benchmarks for Google and industry peers.

Committed to sharing tools and discoveries

In the three years since Google released our AI Principles, we’ve worked to integrate this ethical charter across our work — from the development of advanced technologies to business processes. As we learn and engage with people and organizations across society we’re committed to sharing tools, processes and discoveries with the global community. You’ll find many of these in our recently updated People + AI Research Guidebook and on the Google AI responsibilities site, which we update quarterly with case studies and other resources. 

Read More

Quickly Training Game-Playing Agents with Machine Learning

Posted by Leopold Haller and Hernan Moraldo, Software Engineers, Google Research

In the last two decades, dramatic advances in compute and connectivity have allowed game developers to create works of ever-increasing scope and complexity. Simple linear levels have evolved into photorealistic open worlds, procedural algorithms have enabled games with unprecedented variety, and expanding internet access has transformed games into dynamic online services. Unfortunately, scope and complexity have grown more rapidly than the size of quality assurance teams or the capabilities of traditional automated testing. This poses a challenge to both product quality (such as delayed releases and post-launch patches) and developer quality of life.

Machine learning (ML) techniques offer a possible solution, as they have demonstrated the potential to profoundly impact game development flows — they can help designers balance their game and empower artists to produce high-quality assets in a fraction of the time traditionally required. Furthermore, they can be used to train challenging opponents that can compete at the highest levels of play. Yet some ML techniques can pose requirements that currently make them impractical for production game teams, including the design of game-specific network architectures, the development of expertise in implementing ML algorithms, or the generation of billions of frames of training data. Conversely, game developers operate in a setting that offers unique advantages to leverage ML techniques, such as direct access to the game source, an abundance of expert demonstrations, and the uniquely interactive nature of video games.

Today, we present an ML-based system that game developers can use to quickly and efficiently train game-testing agents, helping developers find serious bugs quickly, while allowing human testers to focus on more complex and intricate problems. The resulting solution requires no ML expertise, works on many of the most popular game genres, and can train an ML policy, which generates game actions from the game state, in less than an hour on a single game instance. We have also released an open source library that demonstrates a functional application of these techniques.

Supported genres include arcade, action/adventure, and racing games.

The Right Tool for the Right Job
The most elemental form of video game testing is to simply play the game. A lot. Many of the most serious bugs (such as crashes or falling out of the world) are easy to detect and fix; the challenge is finding them within the vast state space of a modern game. As such, we decided to focus on training a system that could “just play the game” at scale.

We found that the most effective way to do this was not to try to train a single, super-effective agent that could play the entire game from end-to-end, but to provide developers with the ability to train an ensemble of game-testing agents, each of which could effectively accomplish tasks of a few minutes each, which game developers refer to as “gameplay loops”.

These core gameplay behaviors are often expensive to program through traditional means, but are much more efficient to train than a single end-to-end ML model. In practice, commercial games create longer loops by repeating and remixing core gameplay loops, which means that developers can test large stretches of gameplay by combining ML policies with a small amount of simple scripting.

Simulation-centric, Semantic API
One of the most fundamental challenges in applying ML to game development is bridging the chasm between the simulation-centric world of video games and the data-centric world of ML. Rather than ask developers to directly convert the game state into custom, low-level ML features (which would be too labor intensive) or attempting to learn from raw pixels (which would require too much data to train), our system provides developers with an idiomatic, game-developer friendly API that allows them to describe their game in terms of the essential state that a player observes and the semantic actions they can perform. All of this information is expressed via concepts that are familiar to game developers, such as entities, raycasts, 3D positions and rotations, buttons and joysticks.

As you can see in the example below, the API allows the specification of observations and actions in just a few lines of code.

Example actions and observations for a racing game.

From API to Neural Network
This high level, semantic API is not just easy to use, but also allows the system to flexibly adapt to the specific game being developed — the specific combination of API building blocks employed by the game developer informs our choice of network architecture, since it provides information about the type of gaming scenario in which the system is deployed. Some examples of this include: handling action outputs differently depending on whether they represent a digital button or analog joystick, or using techniques from image processing to handle observations that result from an agent probing its environment with raycasts (similar to how autonomous vehicles probe their environment with LIDAR).

Our API is sufficiently general to allow modeling of many common control-schemes (the configuration of action outputs that control movement) in games, such as first-person games, third-person games with camera-relative controls, racing games, twin stick shooters, etc. Since 3D movement and aiming are often an integral aspect of gameplay in general, we create networks that automatically tend towards simple behaviors such as aiming, approach or avoidance in these games. The system accomplishes this by analyzing the game’s control scheme to create neural network layers that perform custom processing of observations and actions in that game. For example, positions and rotations of objects in the world are automatically translated into directions and distances from the point of view of the AI-controlled game entity. This transformation typically increases the speed of learning and helps the learned network generalize better.

An example neural network generated for a game with joystick controls and raycast inputs. Depending on the inputs (red) and the control scheme, the system generates custom pre- and post-processing layers (orange).

Learning From The Experts in Real Time
After generating a neural network architecture, the network needs to be trained to play the game using an appropriate choice of learning algorithm.

Reinforcement learning (RL), in which an ML policy is trained directly to maximize a reward, may seem like the obvious choice since they have been successfully used to train highly competent ML policies for games. However, RL algorithms tend to require more data than a single game instance can produce in a reasonable amount of time, and achieving good results in a new domain often requires hyperparameter tuning and strong ML domain knowledge.

Instead, we found that imitation learning (IL), which trains ML policies based by observing experts play the game, works well for our use case. Unlike RL, where the agent needs to discover a good policy on its own, IL only needs to recreate the behavior of a human expert. Since game developers and testers are experts in their own games, they can easily provide demonstrations of how to play the game.

We use an IL approach inspired by the DAgger algorithm, which allows us to take advantage of video games’ most compelling quality — interactivity. Thanks to the reductions in training time and data requirements enabled by our semantic API, training is effectively realtime, giving a developer the ability to fluidly switch between providing gameplay demonstrations and watching the system play. This results in a natural feedback loop, in which a developer iteratively provides corrections to a continuous stream of ML policies.

From the developer’s perspective, providing a demonstration or a correction to faulty behavior is as simple as picking up the controller and starting to play the game. Once they are done, they can put the controller down and watch the ML policy play. The result is a training experience that is real-time, interactive, highly experiential, and, very often, more than a little fun.

ML policy for an FPS game, trained with our system.

Conclusion
We present a system that combines a high-level semantic API with a DAgger-inspired interactive training flow that enables training of useful ML policies for video game testing in a wide variety of genres. We have released an open source library as a functional illustration of our system. No ML expertise is required and training of agents for test applications often takes less than an hour on a single developer machine. We hope that this work will help inspire the development of ML techniques that can be deployed in real-world game-development flows in ways that are accessible, effective, and fun to use.

Acknowledgements
We’d like to thank the core members of the project: Dexter Allen, Leopold Haller, Nathan Martz, Hernan Moraldo, Stewart Miles and Hina Sakazaki. Training algorithms are provided by TF Agents, and on-device inference by TF Lite. Special thanks to our research advisors, Olivier Bachem, Erik Frey, and Toby Pohlen, and to Eugene Brevdo, Jared Duke, Oscar Ramirez and Neal Wu who provided helpful guidance and support.

Read More

Douglas Coupland fuses AI and art to inspire students

Have you ever noticed that the word art is embedded in the phrase artificial intelligence? Neither did we, but when the opportunity presented itself to explore how artificial intelligence (AI) inspires artistic expression — with the help of internationally renowned Canadian artist Douglas Coupland — the Google Research team jumped on it. This collaboration, with the support of Google Arts & Culture, culminated in a project called Slogans for the Class of 2030, which spotlights the experiences of the first generation of young people whose lives are fully intertwined with the existence of AI. 

This collaboration was brought to life by first introducing Coupland’s written work to a machine learning language model. Machine learning is a form of AI that provides computer systems the ability to automatically learn from data. In this case, Google research scientists tuned a machine learning algorithm with Coupland’s 30-year body of written work — more than a million words — so it would familiarize itself with the author’s unique style of writing. From there, curated general-public social media posts on selected topics were added to teach the algorithm how to craft short-form, topical statements.

Once the algorithm was trained, the next step was to process and reassemble suggestions of text for Coupland to use as inspiration to create twenty-five Slogans for the Class of 2030

“I would comb through ‘data dumps’ where characters from one novel were speaking with those in other novels in ways that they might actually do. It felt like I was encountering a parallel universe Doug,” Coupland says. “And from these outputs, the statements you see here in this project appeared like gems. Did I write them? Yes. No. Could they have existed without me? No.”

A common theme in Coupland’s work is the investigation of the human condition through the lens of pop culture. The focus on the class of 2030 was intentional. Coupland wanted to create works that would serve as inspiration for students in their early teens who will be graduating from universities in 2030. For those teens considering their future career paths, he hoped that this collaboration would trigger a broader conversation on the vast possibilities in the field and would acquaint them with the fact that AI does not have to be strictly scientific, it can be artful.

Unveiled today, all 25 thought-provoking and visually rich digital slogans are yours to experience on Google Arts & Culture alongside Coupland’s artistic statement and other behind-the-scenes material. This isn’t Douglas’ first collaboration with Google Arts & Culture; in 2019 Coupland’s Vancouver art exhibition was captured virtually. In 2015 and 2016, he joined theGoogle Arts & Culture Lab residency in Paris where he collaborated with engineers to develop many works including theSearch book andthe Living Internet

In an effort to celebrate local talent and culture, multiple venues across Canada have signed on to project the slogans, for a limited time, on larger-than-life digital screens allowing curious minds to experience them in an immersive way. The screens include the Terry Fox Memorial Plaza at BC Place Stadium in Vancouver B. C., the TELUS Len Werry building in downtown Calgary, TELUS Harbour in downtown Toronto and select Pattison digital screens across Canada.     

Technology has always played a role in creating new types of possibilities that inspire artists — from the sounds of distortion to the electronic sounds of synths for musicians for example. Today, advances in AI are opening up new possibilities for other forms of art and we look forward to seeing what the crossroads of art and technology bring to life. 

Read More

How we’re supporting 30 new AI for Social Good projects

Over recent years, we have seen remarkable progress in AI’s ability to confront new problems and help solve old ones. Advancing these efforts was one reason we set up the Google Research India lab in 2019, with a particular emphasis on AI research that could make a positive social impact. It’s also why we’ve supported nonprofit organizations through the Google AI Impact Challenge.

Working in partnership with Google.org and Google’s University Relations program, our goal is to help academics and nonprofits develop AI techniques that can improve people’s lives — especially in underserved communities that haven’t yet benefited from advances in AI. We reported on the impact of six such projects in 2020. And today, we’re sharing 30 new projects that will receive funding and support as part of our AI for Social Good program

During the application process, Googlers arranged workshops involving more than 150 teams to discuss potential projects. Following the workshop meetings, project teams made up of NGOs and academics submitted proposals which Google experts reviewed. The result is a promising range of projects spanning seventeen countries across Asia-Pacific and Sub-Saharan Africa — including India, Uganda, Nigeria, Japan and Australia— focused on agriculture, conservation and public health. 

In agriculture, this includes research to help farmer collectives with market intelligence and use data to improve crop and irrigation planning for smallholder farmers. In public health, we are backing projects that will enable targeted public health interventions, and will help community health workers to forecast health risks in countries such as Kenya, India and Uganda. We’re also supporting research to better forecast the need for critical resources like vaccines and care, including in Nigeria. And in conservation, we’re supporting research to help understand animal population changes, such as the effect of poaching on elephants, and gorillas. Other projects will help reduce conservation conflict and poaching, including human-elephant conflict in Kenya.

Each project team will receive funding, technical contributions from Google and access to computational resources. Academics in this program will be recognized as “Impact Scholars” for their contributions towards advancing research for social good.  

We’ve seen the impact these kinds of projects can make. One of the nonprofit leaders supported by the program last year, ARMMAN founder Dr. Aparna Hegde, has received AI research support from IIT Madras and Google Research to improve maternal and child health outcomes in India. The team is building a predictive model to prevent expectant mothers dropping out of supportive telehealth outreach programs. Results so far show AI could enable ARMMAN to increase the number of women engaged through the program by 50%, and they have received a second Google.org grant to enable them to build on this progress. Dr. Hegde says the program is “already showing encouraging results — and I am confident that this partnership will bring immense benefits in the future.”

Congratulations to all the recipients of this round’s support. We’re looking forward to continuing to nurture the AI for Social Good community, bringing together experts from diverse backgrounds with the common goal of advancing AI to improve lives around the world.

Read More

Take All Your Pictures to the Cleaners, with Google Photos Noise and Blur Reduction

Posted by Mauricio Delbracio, Research Scientist and Sungjoon Choi, Software Engineer, Google Research

Despite recent leaps in imaging technology, especially on mobile devices, image noise and limited sharpness remain two of the most important levers for improving the visual quality of a photograph. These are particularly relevant when taking pictures in poor light conditions, where cameras may compensate by increasing the ISO or slowing the shutter speed, thereby exacerbating the presence of noise and, at times, increasing image blur. Noise can be associated with the particle nature of light (shot noise) or be introduced by electronic components during the readout process (read noise). The captured noisy signal is then processed by the camera image processor (ISP) and later may be further enhanced, amplified, or distorted by a photographic editing process. Image blur can be caused by a wide variety of phenomena, from inadvertent camera shake during capture, an incorrect setting of the camera’s focus (automatic or not), or due to the finite lens aperture, sensor resolution or the camera’s image processing.

It is far easier to minimize the effects of noise and blur within a camera pipeline, where details of the sensor, optical hardware and software blocks are understood. However, when presented with an image produced from an arbitrary (possibly unknown) camera, improving noise and sharpness becomes much more challenging due to the lack of detailed knowledge and access to the internal parameters of the camera. In most situations, these two problems are intrinsically related: noise reduction tends to eliminate fine structures along with unwanted details, while blur reduction seeks to boost structures and fine details. This interconnectedness increases the difficulty of developing image enhancement techniques that are computationally efficient to run on mobile devices.

Today, we present a new approach for camera-agnostic estimation and elimination of noise and blur that can improve the quality of most images. We developed a pull-push denoising algorithm that is paired with a deblurring method, called polyblur. Both of these components are designed to maximize computational efficiency, so users can successfully enhance the quality of a multi-megapixel image in milliseconds on a mobile device. These noise and blur reduction strategies are critical components of the recent Google Photos editor updates, which includes “Denoise” and “Sharpen” tools that enable users to enhance images that may have been captured under less than ideal conditions, or with older devices that may have had more noisy sensors or less sharp optics.

A demonstration of the “Denoise” and “Sharpen” tools now available in the Google Photos editor.

How Noisy is An Image?
In order to accurately process a photographic image and successfully reduce the unwanted effects of noise and blur, it is vitally important to first characterize the types and levels of noise and blur found in the image. So, a camera-agnostic approach for noise reduction begins by formulating a method to gauge the strength of noise at the pixel level from any given image, regardless of the device that created it. The noise level is modeled as a function of the brightness of the underlying pixel. That is, for each possible brightness level, the model estimates a corresponding noise level in a manner agnostic to either the actual source of the noise or the processing pipeline.

To estimate this brightness-based noise level, we sample a number of small patches across the image and measure the noise level within each patch, after roughly removing any underlying structure in the image. This process is repeated at multiple scales, making it robust to artifacts that may arise from compression, image resizing, or other non-linear camera processing operations.

The two segments on the left illustrate signal-dependent noise present in the input image (center). The noise is more prominent in the bottom, darker crop and is unrelated to the underlying structure, but rather to the light level. Such image segments are sampled and processed to generate the spatially-varying noise map (right) where red indicates more noise is present.

Reducing Noise Selectively with a Pull-Push Method
We take advantage of self-similarity of patches across the image to denoise with high fidelity. The general principle behind such so-called “non-local” denoising is that noisy pixels can be denoised by averaging pixels with similar local structure. However, these approaches typically incur high computational costs because they require a brute force search for pixels with similar local structure, making them impractical for on-device use. In our “pull-push” approach1, the algorithmic complexity is decoupled from the size of filter footprints thanks to effective information propagation across spatial scales.

The first step in pull-push is to build an image pyramid (i.e., multiscale representation) in which each successive level is generated recursively by a “pull” filter (analogous to downsampling). This filter uses a per-pixel weighting scheme to selectively combine existing noisy pixels together based on their patch similarities and estimated noise, thus reducing the noise at each successive, “coarser” level. Pixels at coarser levels (i.e., with lower resolution) pull and aggregate only compatible pixels from higher resolution, “finer” levels. In addition to this, each merged pixel in the coarser layers also includes an estimated reliability measure computed from the similarity weights used to generate it. Thus, merged pixels provide a simple per-pixel, per-level characterization of the image and its local statistics. By efficiently propagating this information through each level (i.e., each spatial scale), we are able to track a model of the neighborhood statistics for increasingly larger regions in a multiscale manner.

After the pull stage is evaluated to the coarsest level, the “push” stage fuses the results, starting from the coarsest level and generating finer levels iteratively. At a given scale, the push stage generates “filtered” pixels following a process similar to that of the pull stage, but going from coarse to finer levels. The pixels at each level are fused with those of coarser levels by doing a weighted average of same-level pixels along with coarser-level filtered pixels using the respective reliability weights. This enables us to reduce pixel noise while preserving local structure, because only average reliable information is included. This selective filtering and reliability (i.e. information) multiscale propagation is what makes push-pull different from existing frameworks.

This series of images shows how filtering progresses through the pull-push process. Coarser level pixels pull and aggregate only compatible pixels from finer levels, as opposed to the traditional multiscale approaches using a fixed (non-data dependent) kernel. Notice how the noise is reduced throughout the stages.

The pull-push approach has a low computational cost, because the algorithm to selectively filter similar pixels over a very large neighborhood has a complexity that is only linear with the number of image pixels. In practice, the quality of this denoising approach is comparable to traditional non-local methods with much larger kernel footprints, but operates at a fraction of the computational cost.

Image enhanced using the pull-push denoising method.

How Blurry Is an Image?
An image with poor sharpness can be thought of as being a more pristine latent image that was operated on by a blur kernel. So, if one can identify the blur kernel, it can be used to reduce the effect. This is referred to as “deblurring”, i.e., the removal or reduction of an undesired blur effect induced by a particular kernel on a particular image. In contrast, “sharpening” refers to applying a sharpening filter, built from scratch and without reference to any particular image or blur kernel. Typical sharpening filters are also, in general, local operations that do not take account of any other information from other parts of the image, whereas deblurring algorithms estimate the blur from the whole image. Unlike arbitrary sharpening, which can result in worse image quality when applied to an image that is already sharp, deblurring a sharp image with a blur kernel accurately estimated from the image itself will have very little effect.

We specifically target relatively mild blur, as this scenario is more technically tractable, more computationally efficient, and produces consistent results. We model the blur kernel as an anisotropic (elliptical) Gaussian kernel, specified by three parameters that control the strength, direction and aspect ratio of the blur.

Gaussian blur model and example blur kernels. Each row of the plot on the right represents possible combinations of σ0, ρ and θ. We show three different σ0 values with three different ρ values for each.

Computing and removing blur without noticeable delay for the user requires an algorithm that is much more computationally efficient than existing approaches, which typically cannot be executed on a mobile device. We rely on an intriguing empirical observation: the maximal value of the image gradient across all directions at any point in a sharp image follows a particular distribution. Finding the maximum gradient value is efficient, and can yield a reliable estimate of the strength of the blur in the given direction. With this information in hand, we can directly recover the parameters that characterize the blur.

Polyblur: Removing Blur by Re-blurring
To recover the sharp image given the estimated blur, we would (in theory) need to solve a numerically unstable inverse problem (i.e., deblurring). The inversion problem grows exponentially more unstable with the strength of the blur. As such, we target the case of mild blur removal. That is, we assume that the image at hand is not so blurry as to be beyond practical repair. This enables a more practical approach — by carefully combining different re-applications of an operator we can approximate its inverse.

Mild blur, as shown in these examples, can be effectively removed by combining multiple applications of the estimated blur.

This means, rather counterintuitively, that we can deblur an image by re-blurring it several times with the estimated blur kernel. Each application of the (estimated) blur corresponds to a first order polynomial, and the repeated applications (adding or subtracting) correspond to higher order terms in a polynomial. A key aspect of this approach, which we call polyblur, is that it is very fast, because it only requires a few applications of the blur itself. This allows it to operate on megapixel images in a fraction of a second on a typical mobile device. The degree of the polynomial and its coefficients are set to invert the blur without boosting noise and other unwanted artifacts.

The deblurred image is generated by adding and subtracting multiple re-applications of the estimated blur (polyblur).

Integration with Google Photos
The innovations described here have been integrated and made available to users in the Google Photos image editor in two new adjustment sliders called “Denoise” and “Sharpen”. These features allow users to improve the quality of everyday images, from any capture device. The features often complement each other, allowing both denoising to reduce unwanted artifacts, and sharpening to bring clarity to the image subjects. Try using this pair of tools in tandem in your images for best results. To learn more about the details of the work described here, check out our papers on polyblur and pull-push denoising. To see some examples of the effect of our denoising and sharpening up close, have a look at the images in this album.

Acknowledgements
The authors gratefully acknowledge the contributions of Ignacio Garcia-Dorado, Ryan Campbell, Damien Kelly, Peyman Milanfar, and John Isidoro. We are also thankful for support and feedback from Navin Sarma, Zachary Senzer, Brandon Ruffin, and Michael Milne.


1 The original pull-push algorithm was developed as an efficient scattered data interpolation method to estimate and fill in the missing pixels in an image where only a subset of the pixels are specified. Here, we extend its methodology and present a data-dependent multiscale algorithm for denoising images efficiently. 

Read More

Achieving Precision in Quantum Material Simulations

Posted by Charles Neill and Zhang Jiang, Senior Research Scientists, Google Quantum AI

In fall of 2019, we demonstrated that the Sycamore quantum processor could outperform the most powerful classical computers when applied to a tailor-made problem. The next challenge is to extend this result to solve practical problems in materials science, chemistry and physics. But going beyond the capabilities of classical computers for these problems is challenging and will require new insights to achieve state-of-the-art accuracy. Generally, the difficulty in performing quantum simulations of such physical problems is rooted in the wave nature of quantum particles, where deviations in the initial setup, interference from the environment, or small errors in the calculations can lead to large deviations in the computational result.

In two upcoming publications, we outline a blueprint for achieving record levels of precision for the task of simulating quantum materials. In the first work, we consider one-dimensional systems, like thin wires, and demonstrate how to accurately compute electronic properties, such as current and conductance. In the second work, we show how to map the Fermi-Hubbard model, which describes interacting electrons, to a quantum processor in order to simulate important physical properties. These works take a significant step towards realizing our long-term goal of simulating more complex systems with practical applications, like batteries and pharmaceuticals.

A bottom view of one of the quantum dilution refrigerators during maintenance. During the operation, the microwave wires that are floating in this image are connected to the quantum processor, e.g., the Sycamore chip, bringing the temperature of the lowest stage to a few tens of milli-degrees above absolute zero temperature.

Computing Electronic Properties of Quantum Materials
In “Accurately computing electronic properties of a quantum ring”, to be published in Nature, we show how to reconstruct key electronic properties of quantum materials. The focus of this work is on one-dimensional conductors, which we simulate by forming a loop out of 18 qubits on the Sycamore processor in order to mimic a very narrow wire. We illustrate the underlying physics through a series of simple text-book experiments, starting with a computation of the “band-structure” of this wire, which describes the relationship between the energy and momentum of electrons in the metal. Understanding such structure is a key step in computing electronic properties such as current and conductance. Despite being an 18-qubit algorithm consisting of over 1,400 logical operations, a significant computational task for near-term devices, we are able to achieve a total error as low as 1%.

The key insight enabling this level of accuracy stems from robust properties of the Fourier transform. The quantum signal that we measure oscillates in time with a small number of frequencies. Taking a Fourier transform of this signal reveals peaks at the oscillation frequencies (in this case, the energy of electrons in the wire). While experimental imperfections affect the height of the observed peaks (corresponding to the strength of the oscillation), the center frequencies are robust to these errors. On the other hand, the center frequencies are especially sensitive to the physical properties of the wire that we hope to study (e.g., revealing small disorders in the local electric field felt by the electrons). The essence of our work is that studying quantum signals in the Fourier domain enables robust protection against experimental errors while providing a sensitive probe of the underlying quantum system.

(Left) Schematic of the 54-qubit quantum processor, Sycamore. Qubits are shown as gray crosses and tunable couplers as blue squares. Eighteen of the qubits are isolated to form a ring. (Middle) Fourier transform of the measured quantum signal. Peaks in the Fourier spectrum correspond to the energy of electrons in the ring. Each peak can be associated with a traveling wave that has fixed momentum. (Right) The center frequency of each peak (corresponding to the energy of electrons in the wire) is plotted versus the peak index (corresponding to the momentum). The measured relationship between energy and momentum is referred to as the ‘band structure’ of the quantum wire and provides valuable information about electronic properties of the material, such as current and conductance.

Quantum Simulation of the Fermi-Hubbard Model
In “Observation of separated dynamics of charge and spin in the Fermi-Hubbard model”, we focus on the dynamics of interacting electrons. Interactions between particles give rise to novel phenomena such as high temperature superconductivity and spin-charge separation. The simplest model that captures this behavior is known as the Fermi-Hubbard model. In materials such as metals, the atomic nuclei form a crystalline lattice and electrons hop from lattice site to lattice site carrying electrical current. In order to accurately model these systems, it is necessary to include the repulsion that electrons feel when getting close to one another. The Fermi-Hubbard model captures this physics with two simple parameters that describe the hopping rate (J) and the repulsion strength (U).

We realize the dynamics of this model by mapping the two physical parameters to logical operations on the qubits of the processor. Using these operations, we simulate a state of the electrons where both the electron charge and spin densities are peaked near the center of the qubit array. As the system evolves, the charge and spin densities spread at different rates due to the strong correlations between electrons. Our results provide an intuitive picture of interacting electrons and serve as a benchmark for simulating quantum materials with superconducting qubits.

(Left top) Illustration of the one-dimensional Fermi-Hubbard model in a periodic potential. Electrons are shown in blue, with their spin indicated by the connected arrow. J, the distance between troughs in the electric potential field, reflects the “hopping” rate, i.e., the rate at which electrons transition from one trough in the potential to another, and U, the amplitude, represents the strength of repulsion between electrons. (Left bottom) The simulation of the model on a qubit ladder, where each qubit (square) represents a fermionic state with spin-up or spin-down (arrows). (Right) Time evolution of the model reveals separated spreading rates of charge and spin. Points and solid lines represent experimental and numerical exact results, respectively. At t = 0, the charge and spin densities are peaked at the middle sites. At later times, the charge density spreads and reaches the boundaries faster than the spin density.

Conclusion
Quantum processors hold the promise to solve computationally hard tasks beyond the capability of classical approaches. However, in order for these engineered platforms to be considered as serious contenders, they must offer computational accuracy beyond the current state-of-the-art classical methods. In our first experiment, we demonstrate an unprecedented level of accuracy in simulating simple materials, and in our second experiment, we show how to embed realistic models of interacting electrons into a quantum processor. It is our hope that these experimental results help progress the goal of moving beyond the classical computing horizon.

Read More

A Dataset for Studying Gender Bias in Translation

Posted by Romina Stella, Product Manager, Google Translate

Advances on neural machine translation (NMT) have enabled more natural and fluid translations, but they still can reflect the societal biases and stereotypes of the data on which they’re trained. As such, it is an ongoing goal at Google to develop innovative techniques to reduce gender bias in machine translation, in alignment with our AI Principles.

One research area has been using context from surrounding sentences or passages to improve gender accuracy. This is a challenge because traditional NMT methods translate sentences individually, but gendered information is not always explicitly stated in each individual sentence. For example, in the following passage in Spanish (a language where subjects aren’t always explicitly mentioned), the first sentence refers explicitly to Marie Curie as the subject, but the second one doesn’t explicitly mention the subject. In isolation, this second sentence could refer to a person of any gender. When translating to English, however, a pronoun needs to be picked, and the information needed for an accurate translation is in the first sentence.

Spanish Text Translation to English
Marie Curie nació en Varsovia. Fue la primera persona en recibir dos premios Nobel en distintas especialidades. Marie Curie was born in Warsaw. She was the first person to receive two Nobel Prizes in different specialties.

Advancing translation techniques beyond single sentences requires new metrics for measuring progress and new datasets with the most common context-related errors. Adding to this challenge is the fact that translation errors related to gender (such as picking the correct pronoun or having gender agreement) are particularly sensitive, because they may directly refer to people and how they self identify.

To help facilitate progress against the common challenges on contextual translation (e.g., pronoun drop, gender agreement and accurate possessives), we are releasing the Translated Wikipedia Biographies dataset, which can be used to evaluate the gender bias of translation models. Our intent with this release is to support long-term improvements on ML systems focused on pronouns and gender in translation by providing a benchmark in which translations’ accuracy can be measured pre- and post-model changes.

A Source of Common Translation Errors
Because they are well-written, geographically diverse, contain multiple sentences, and refer to subjects in the third person (and so contain plenty of pronouns), Wikipedia biographies offer a high potential for common translation errors associated with gender. These often occur when articles refer to a person explicitly in early sentences of a paragraph, but there is no explicit mention of the person in later sentences. Some examples:

Translation Error     Text     Translation
Pro-drop in Spanish → English     Marie Curie nació en Varsovia. Recibió el Premio Nobel en 1903 y en 1911.     Marie Curie was born in Warsaw. He received the Nobel Prize in 1903 and in 1911.
Neutral possessives in Spanish → English     Marie Curie nació en Varsovia. Su carrera profesional fue desarrollada en Francia.     Marie Curie was born in Warsaw. His professional career was developed in France.
Gender agreement in English → German     Marie Curie was born in Warsaw. The distinguished scientist received the Nobel Prize in 1903 and in 1911.     Marie Curie wurde in Varsovia geboren. Der angesehene Wissenschaftler erhielt 1903 und 1911 den Nobelpreis.
Gender agreement in English → Spanish     Marie Curie was born in Warsaw. The distinguished scientist received the Nobel Prize in 1903 and in 1911.     Marie Curie nació en Varsovia. El distinguido científico recibió el Premio Nobel en 1903 y en 1911.

Building the Dataset
The Translated Wikipedia Biographies dataset has been designed to analyze common gender errors in machine translation, such as those illustrated above. Each instance of the dataset represents a person (identified in the biographies as feminine or masculine), a rock band or a sports team (considered genderless). Each instance is represented by a long text translation of 8 to 15 connected sentences referring to that central subject (the person, rock band, or sports team). Articles are written in native English and have been professionally translated to Spanish and German. For Spanish, translations were optimized for pronoun-drop, so the same set could be used to analyze pro-drop (Spanish → English) and gender agreement (English → Spanish).

The dataset was built by selecting a group of instances that has equal representation across geographies and genders. To do this, we extracted biographies from Wikipedia according to occupation, profession, job and/or activity. To ensure an unbiased selection of occupations, we chose nine occupations that represented a range of stereotypical gender associations (either feminine, masculine, or neither) based on Wikipedia statistics. Then, to mitigate any geography-based bias, we divided all these instances based on geographical diversity. For each occupation category, we looked to have one candidate per region (using regions from census.gov as a proxy of geographical diversity). When an instance was associated with a region, we checked that the selected person had a relevant relationship with a country that belongs to a designated region (nationality, place of birth, lived for a big portion of their life, etc.). By using this criteria, the dataset contains entries about individuals from more than 90 countries and all regions of the world.

Although gender is non-binary, we focused on having equal representation of “feminine” and “masculine” entities. It’s worth mentioning that because the entities are represented as such on Wikipedia, the set doesn’t include individuals that identify as non-binary, as, unfortunately, there are not enough instances currently represented in Wikipedia to accurately reflect the non-binary community. To label each instance as “feminine” or “masculine” we relied on the biographical information from Wikipedia, which contained gender-specific references to the person (she, he, woman, son, father, etc.).

After applying all these filters, we randomly selected an instance for each occupation-region-gender triplet. For each occupation, there are two biographies (one masculine and one feminine), for each of the seven geographic regions.

Finally, we added 12 instances with no gender. We picked rock bands and sports teams because they are usually referred to by non-gendered third person pronouns (such as “it” or singular “they”). The purpose of including these instances is to study over triggering, i.e., when models learn that they are rewarded for producing gender-specific pronouns, leading them to produce these pronouns in cases where they shouldn’t.

Results and Applications
This dataset enables a new method of evaluation for gender bias reduction in machine translations (introduced in a previous post). Because each instance refers to a subject with a known gender, we can compute the accuracy of the gender-specific translations that refer to this subject. This computation is easier when translating into English (cases of languages with pro-drop or neutral pronouns) since computation is mainly based on gender-specific pronouns in English. In these cases, the gender datasets have resulted in a 67% reduction in errors on context-aware models vs. previous models. As mentioned before, the neutral entities have allowed us to discover cases of over triggering like the usage of feminine or masculine pronouns to refer to genderless entities. This new dataset also enables new research directions into the performance of different models across types of occupations or geographic regions.

As an example, the dataset allowed us to discover the following improvements in an excerpt of the translated biography of Marie Curie from Spanish.

Translation result with the previous NMT model.
Translation result with the new contextual model.

Conclusion
This Translated Wikipedia Biographies dataset is the result of our own studies and work on identifying biases associated with gender and machine translation. This set focuses on a specific problem related to gender bias and doesn’t aim to cover the whole problem. It’s worth mentioning that by releasing this dataset, we don’t aim to be prescriptive in determining what’s the optimal approach to address gender bias. This contribution aims to foster progress on this challenge across the global research community.

Acknowledgements
The datasets were built with help from Anja Austermann, Melvin Johnson, Michelle Linch, Mengmeng Niu, Mahima Pushkarna, Apu Shah, Romina Stella, and Kellie Webster.

Read More

Improving Genomic Discovery with Machine Learning

Posted by Andrew Carroll, Product Manager and Cory McLean, Software Engineer, Google Health

Each person’s genome, which collectively encodes the biochemical machinery they are born with, is composed of over 3 billion letters of DNA. However, only a small subset of the genome (~4-5 million positions) varies between two people. Nonetheless, each person’s unique genome interacts with the environment they experience to determine the majority of their health outcomes. A key method of understanding the relationship between genetic variants and traits is a genome-wide association study (GWAS), in which each genetic variant present in a cohort is individually examined for correlation with the trait of interest. GWAS results can be used to identify and prioritize potential therapeutic targets by identifying genes that are strongly associated with a disease of interest, and can also be used to build a polygenic risk score (PRS) to predict disease predisposition based on the combined influence of variants present in an individual. However, while accurate measurement of traits in an individual (called phenotyping) is essential to GWAS, it often requires painstaking expert curation and/or subjective judgment calls.

In “Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology”, we demonstrate how using machine learning (ML) models to classify medical imaging data can be used to improve GWAS. We describe how models can be trained for phenotypes to generate trait predictions and how these predictions are used to identify novel genetic associations. We then show that the novel associations discovered improve PRS accuracy and, using glaucoma as an example, that the improvements for anatomical eye traits relate to human disease. We have released the model training code and detailed documentation for its use on our Genomics Research GitHub repository.

Identifying genetic variants associated with eye anatomical traits
Previous work has demonstrated that ML models can identify eye diseases, skin diseases, and abnormal mammogram results with accuracy approaching or exceeding state-of-the-art methods by domain experts. Because identifying disease is a subset of phenotyping, we reasoned that ML models could be broadly used to improve the speed and quality of phenotyping for GWAS.

To test this, we chose a model that uses a fundus image of the eye to accurately predict whether a patient should be referred for assessment for glaucoma. This model uses the fundus images to predict the diameters of the optic disc (the region where the optic nerve connects to the retina) and the optic cup (a whitish region in the center of the optic disc). The ratio of the diameters of these two anatomical features (called the vertical cup-to-disc ratio, or VCDR) correlates strongly with glaucoma risk.

A representative retinal fundus image showing the vertical cup-to-disc ratio, which is an important diagnostic measurement for glaucoma.

We applied this model to predict VCDR in all fundus images from individuals in the UK Biobank, which is the world’s largest dataset available to researchers worldwide for health-related research in the public interest, containing extensive phenotyping and genetic data for ~500,000 pseudonymized (the UK Biobank’s standard for de-identification) individuals. We then performed GWAS in this dataset to identify genetic variants that are associated with the model-based predictions of VCDR.

Applying a VCDR prediction model trained on clinical data to generate predicted values for VCDR to enable discovery of genetic associations for the VCDR trait.

The ML-based GWAS identified 156 distinct genomic regions associated with VCDR. We compared these results to a VCDR GWAS conducted by another group on the same UK Biobank data, Craig et al. 2020, where experts had painstakingly labeled all images for VCDR. The ML-based GWAS replicates 62 of the 65 associations found in Craig et al., which indicates that the model accurately predicts VCDR in the UK Biobank images. Additionally, the ML-based GWAS discovered 93 novel associations.

Number of statistically significant GWAS associations discovered by exhaustive expert labeling approach (Craig et al., left), and by our ML-based approach (right), with shared associations in the middle.

The ML-based GWAS improves polygenic model predictions
To validate that the novel associations discovered in the ML-based GWAS are biologically relevant, we developed independent PRSes using the Craig et al. and ML-based GWAS results, and tested their ability to predict human-expert-labeled VCDR in a subset of UK Biobank as well as a fully independent cohort (EPIC-Norfolk). The PRS developed from the ML-based GWAS showed greater predictive ability than the PRS built from the expert labeling approach in both datasets, providing strong evidence that the novel associations discovered by the ML-based method influence VCDR biology, and suggesting that the improved phenotyping accuracy (i.e., more accurate VCDR measurement) of the model translates into a more powerful GWAS.

The correlation between a polygenic risk score (PRS) for VCDR generated from the ML-based approach and the exhaustive expert labeling approach (Craig et al.). In these plots, higher values on the y-axis indicate a greater correlation and therefore greater prediction from only the genetic data. [* — p ≤ 0.05; *** — p ≤ 0.001]

As a second validation, because we know that VCDR is strongly correlated with glaucoma, we also investigated whether the ML-based PRS was correlated with individuals who had either self-reported that they had glaucoma or had medical procedure codes suggestive of glaucoma or glaucoma treatment. We found that the PRS for VCDR determined using our model predictions were also predictive of the probability that an individual had indications of glaucoma. Individuals with a PRS 2.5 or more standard deviations higher than the mean were more than 3 times as likely to have glaucoma in this cohort. We also observed that the VCDR PRS from ML-based phenotypes was more predictive of glaucoma than the VCDR PRS produced from the extensive manual phenotyping.

The odds ratio of glaucoma (self-report or ICD code) stratified by the PRS for VCDR determined using the ML-based phenotypes (in standard deviations from the mean). In this plot, the y-axis shows the probability that the individual has glaucoma relative to the baseline rate (represented by the dashed line). The x-axis shows standard deviations from the mean for the PRS. Data are visualized as a standard box plot, which illustrates values for the mean (the orange line), first and third quartiles, and minimum and maximum.

Conclusion
We have shown that ML models can be used to quickly phenotype large cohorts for GWAS, and that these models can increase statistical power in such studies. Although these examples were shown for eye traits predicted from retinal imaging, we look forward to exploring how this concept could generally apply to other diseases and data types.

Acknowledgments
We would like to especially thank co-author Dr. Anthony Khawaja of Moorfields Eye Hospital for contributing his extensive medical expertise. We also recognize the efforts of Professor Jamie Craig and colleagues for their exhaustive labeling of UK Biobank images, which allowed us to make comparisons with our method. Several authors of that work, as well as Professor Stuart MacGregor and collaborators in Australia and at Max Kelsen have independently replicated these findings, and we value these scientific contributions as well. Last, this work summarizes the work of the following Google contributors, who we would like to thank: Babak Alipanahi, Farhad Hormozdiari, Babak Behsaz, Justin Cosentino, Zachary R. McCaw, Emanuel Schorsch, D. Sculley, Elizabeth H. Dorfman, Sonia Phene, Naama Hammel, Andrew Carroll, and Cory Y. McLean

Read More

Quantum Machine Learning and the Power of Data

Posted by Jarrod McClean, Staff Research Scientist and Hsin-Yuan (Robert) Huang1, Intern, Google Quantum AI

Quantum computing has rapidly advanced in both theory and practice in recent years, and with it the hope for the potential impact in real applications. One key area of interest is how quantum computers might affect machine learning. We recently demonstrated experimentally that quantum computers are able to naturally solve certain problems with complex correlations between inputs that can be incredibly hard for traditional, or “classical”, computers. This suggests that learning models made on quantum computers may be dramatically more powerful for select applications, potentially boasting faster computation, better generalization on less data, or both. Hence it is of great interest to understand in what situations such a “quantum advantage” might be achieved.

The idea of quantum advantage is typically phrased in terms of computational advantages. That is, given some task with well defined inputs and outputs, can a quantum computer achieve a more accurate result than a classical machine in a comparable runtime? There are a number of algorithms for which quantum computers are suspected to have overwhelming advantages, such as Shor’s factoring algorithm for factoring products of large primes (relevant to RSA encryption) or the quantum simulation of quantum systems. However, the difficulty of solving a problem, and hence the potential advantage for a quantum computer, can be greatly impacted by the availability of data. As such, understanding when a quantum computer can help in a machine learning task depends not only on the task, but also the data available, and a complete understanding of this must include both.

In “Power of data in quantum machine learning”, published in Nature Communications, we dissect the problem of quantum advantage in machine learning to better understand when it will apply. We show how the complexity of a problem formally changes with the availability of data, and how this sometimes has the power to elevate classical learning models to be competitive with quantum algorithms. We then develop a practical method for screening when there may be a quantum advantage for a chosen set of data embeddings in the context of kernel methods. We use the insights from the screening method and learning bounds to introduce a novel method that projects select aspects of feature maps from a quantum computer back into classical space. This enables us to imbue the quantum approach with additional insights from classical machine learning that shows the best empirical separation in quantum learning advantages to date.

Computational Power of Data
The idea of quantum advantage over a classical computer is often framed in terms of computational complexity classes. Examples such as factoring large numbers and simulating quantum systems are classified as bounded quantum polynomial time (BQP) problems, which are those thought to be handled more easily by quantum computers than by classical systems. Problems easily solved on classical computers are called bounded probabilistic polynomial (BPP) problems.

We show that learning algorithms equipped with data from a quantum process, such as a natural process like fusion or chemical reactions, form a new class of problems (which we call BPP/Samp) that can efficiently perform some tasks that traditional algorithms without data cannot, and is a subclass of the problems efficiently solvable with polynomial sized advice (P/poly). This demonstrates that for some machine learning tasks, understanding the quantum advantage requires examination of available data as well.


Geometric Test for Quantum Learning Advantage

Informed by the results that the potential for advantage changes depending on the availability of data, one may ask how a practitioner can quickly evaluate if their problem may be well suited for a quantum computer. To help with this, we developed a workflow for assessing the potential for advantage within a kernel learning framework. We examined a number of tests, the most powerful and informative of which was a novel geometric test we developed.

In quantum machine learning methods, such as quantum neural networks or quantum kernel methods, a quantum program is often divided into two parts, a quantum embedding of the data (an embedding map for the feature space using a quantum computer), and the evaluation of a function applied to the data embedding. In the context of quantum computing, quantum kernel methods make use of traditional kernel methods, but use the quantum computer to evaluate part or all of the kernel on the quantum embedding, which has a different geometry than a classical embedding. It was conjectured that a quantum advantage might arise from the quantum embedding, which might be much better suited to a particular problem than any accessible classical geometry.

We developed a quick and rigorous test that can be used to quickly compare a particular quantum embedding, kernel, and data set to a range of classical kernels and assess if there is any opportunity for quantum advantage across, e.g., possible label functions such as those used for image recognition tasks. We define a geometric constant g, which quantifies the amount of data that could theoretically close that gap, based on the geometric test. This is an extremely useful technique for deciding, based on data constraints, if a quantum solution is right for the given problem.

Projected Quantum Kernel Approach
One insight revealed by the geometric test, was that existing quantum kernels often suffered from a geometry that was easy to best classically because they encouraged memorization, instead of understanding. This inspired us to develop a projected quantum kernel, in which the quantum embedding is projected back to a classical representation. While this representation is still hard to compute with a classical computer directly, it comes with a number of practical advantages in comparison to staying in the quantum space entirely.

Geometric quantity g, which quantifies the potential for quantum advantage, depicted for several embeddings, including the projected quantum kernel introduced here.

By selectly projecting back to classical space, we can retain aspects of the quantum geometry that are still hard to simulate classically, but now it is much easier to develop distance functions, and hence kernels, that are better behaved with respect to modest changes in the input than was the original quantum kernel. In addition the projected quantum kernel facilitates better integration with powerful non-linear kernels (like a squared exponential) that have been developed classically, which is much more challenging to do in the native quantum space.

This projected quantum kernel has a number of benefits over previous approaches, including an improved ability to describe non-linear functions of the existing embedding, a reduction in the resources needed to process the kernel from quadratic to linear with the number of data points, and the ability to generalize better at larger sizes. The kernel also helps to expand the geometric g, which helps to ensure the greatest potential for quantum advantage.

Data Sets Exhibit Learning Advantages
The geometric test quantifies potential advantage for all possible label functions, however in practice we are most often interested in specific label functions. Using learning theoretic approaches, we also bound the generalization error for specific tasks, including those which are definitively quantum in origin. As the advantage of a quantum computer relies on its ability to use many qubits simultaneously but previous approaches scale poorly in number of qubits, it is important to verify the tasks at reasonably large qubit sizes ( > 20 ) to ensure a method has the potential to scale to real problems. For our studies we verified up to 30 qubits, which was enabled by the open source tool, TensorFlow-Quantum, enabling scaling to petaflops of compute.

Interestingly, we showed that many naturally quantum problems, even up to 30 qubits, were readily handled by classical learning methods when sufficient data were provided. Hence one conclusion is that even for some problems that look quantum, classical machine learning methods empowered by data can match the power of quantum computers. However, using the geometric construction in combination with the projected quantum kernel, we were able to construct a data set that exhibited an empirical learning advantage for a quantum model over a classical one. Thus, while it remains an open question to find such data sets in natural problems, we were able to show the existence of label functions where this can be the case. Although this problem was engineered and a quantum computational advantage would require the embeddings to be larger and more challenging, this work represents an important step in understanding the role data plays in quantum machine learning.

Prediction accuracy as a function of the number of qubits (n) for a problem engineered to maximize the potential for learning advantage in a quantum model. The data is shown for two different sizes of training data (N).

For this problem, we scaled up the number of qubits (n) and compared the prediction accuracy of the projected quantum kernel to existing kernel approaches and the best classical machine learning model in our dataset. Moreover, a key takeaway from these results is that although we showed the existence of datasets where a quantum computer has an advantage, for many quantum problems, classical learning methods were still the best approach. Understanding how data can affect a given problem is a key factor to consider when discussing quantum advantage in learning problems, unlike traditional computation problems for which that is not a consideration.

Conclusions
When considering the ability of quantum computers to aid in machine learning, we have shown that the availability of data fundamentally changes the question. In our work, we develop a practical set of tools for examining these questions, and use them to develop a new projected quantum kernel method that has a number of advantages over existing approaches. We build towards the largest numerical demonstration to date, 30 qubits, of potential learning advantages for quantum embeddings. While a complete computational advantage on a real world application remains to be seen, this work helps set the foundation for the path forward. We encourage any interested readers to check out both the paper and related TensorFlow-Quantum tutorials that make it easy to build on this work.

Acknowledgements
We would like to acknowledge our co-authors on this paper — Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, and Hartmut Neven, as well as the entirety of the Google Quantum AI team. In addition, we acknowledge valuable help and feedback from Richard Kueng, John Platt, John Preskill, Thomas Vidick, Nathan Wiebe, Chun-Ju Wu, and Balint Pato.


1Current affiliation — Institute for Quantum Information and Matter and Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA

Read More