Reducing the Computational Cost of Deep Reinforcement Learning Research

Posted by Pablo Samuel Castro, Staff Software Engineer, Google Research

It is widely accepted that the enormous growth of deep reinforcement learning research, which combines traditional reinforcement learning with deep neural networks, began with the publication of the seminal DQN algorithm. This paper demonstrated the potential of this combination, showing that it could produce agents that could play a number of Atari 2600 games very effectively. Since then, there have been several approaches that have built on and improved the original DQN. The popular Rainbow algorithm combined a number of these recent advances to achieve state-of-the-art performance on the ALE benchmark. This advance, however, came at a very high computational cost, which has the unfortunate side effect of widening the gap between those with ample access to computational resources and those without.

In “Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research”, to be presented at ICML 2021, we revisit this algorithm on a set of small- and medium-sized tasks. We first discuss the computational cost associated with the Rainbow algorithm. We explore how the same conclusions regarding the benefits of combining the various algorithmic components can be reached with smaller-scale experiments, and further generalize that idea to how research done on a smaller computational budget can provide valuable scientific insights.

The Cost of Rainbow
A major reason for the computational cost of Rainbow is that the standards in academic publishing often require evaluating new algorithms on large benchmarks like ALE, which consists of 57 Atari 2600 games that reinforcement learning agents may learn to play. For a typical game, it takes roughly five days to train a model using a Tesla P100 GPU. Furthermore, if one wants to establish meaningful confidence bounds, it is common to perform at least five independent runs. Thus, to train Rainbow on the full suite of 57 games required around 34,200 GPU hours (or 1425 days) in order to provide convincing empirical performance statistics. In other words, such experiments are only feasible if one is able to train on multiple GPUs in parallel, which can be prohibitive for smaller research groups.

Revisiting Rainbow
As in the original Rainbow paper, we evaluate the effect of adding the following components to the original DQN algorithm: double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and noisy nets.

We evaluate on a set of four classic control environments, which can be fully trained in 10-20 minutes (compared to five days for ALE games):

Upper left: In CartPole, the task is to balance a pole on a cart that the agent can move left and right. Upper right: In Acrobot, there are two arms and two joints, where the agent applies force to the joint between the two arms in order to raise the lower arm above a threshold. Lower left: In LunarLander, the agent is meant to land the spaceship between the two flags. Lower right: In MountainCar, the agent must build up momentum between two hills to drive to the top of the rightmost hill.

We investigated the effect of both independently adding each of the components to DQN, as well as removing each from the full Rainbow algorithm. As in the original Rainbow paper, we find that, in aggregate, the addition of each of these algorithms does improve learning over the base DQN. However, we also found some important differences, such as the fact that distributional RL — commonly thought to be a positive addition on its own — does not always yield improvements on its own. Indeed, in contrast to the ALE results in the Rainbow paper, in the classic control environments, distributional RL only yields an improvement when combined with another component.

Each plot shows the training progress when adding the various components to DQN. The x-axis is training steps,the y-axis is performance (higher is better).
Each plot shows the training progress when removing the various components from Rainbow. The x-axis is training steps,the y-axis is performance (higher is better).

We also re-ran the Rainbow experiments on the MinAtar environment, which consists of a set of five miniaturized Atari games, and found qualitatively similar results. The MinAtar games are roughly 10 times faster to train than the regular Atari 2600 games on which the original Rainbow algorithm was evaluated, but still share some interesting aspects, such as game dynamics and having pixel-based inputs to the agent. As such, they provide a challenging mid-level environment, in between the classic control and the full Atari 2600 games.

When viewed in aggregate, we find our results to be consistent with those of the original Rainbow paper — the impact resulting from each algorithmic component can vary from environment to environment. If we were to suggest a single agent that balances the tradeoffs of the different algorithmic components, our version of Rainbow would likely be consistent with the original, in that combining all components produces a better overall agent. However, there are important details in the variations of the different algorithmic components that merit a more thorough investigation.

Beyond the Rainbow
When DQN was introduced, it made use of the Huber loss and the RMSProp Optimizer. It has been common practice for researchers to use these same choices when building on DQN, as most of their effort is spent on other algorithmic design decisions. In the spirit of reassessing these assumptions, we revisited the loss function and optimizer used by DQN on a lower-cost, small-scale classic control and MinAtar environments. We ran some initial experiments using the Adam optimizer, which has lately been the most popular optimizer choice, combined with a simpler loss function, the mean-squared error loss (MSE). Since the selection of optimizer and loss function is often overlooked when developing a new algorithm, we were surprised to see that we observed a dramatic improvement on all the classic control and MinAtar environments.

We thus decided to evaluate the different ways of combining the two optimizers (RMSProp and Adam) with the two losses (Huber and MSE) on the full ALE suite (60 Atari 2600 games). We found that Adam+MSE is a superior combination than RMSProp+Huber.

Measuring the improvement Adam+MSE gives over the default DQN settings (RMSProp + Huber); higher is better.

Additionally, when comparing the various optimizer-loss combinations, we find that when using RMSProp, the Huber loss tends to perform better than MSE (illustrated by the gap between the solid and dotted orange lines).

Normalized scores aggregated over all 60 Atari 2600 games, comparing the different optimizer-loss combinations.

Conclusion
On a limited computational budget we were able to reproduce, at a high-level, the findings of the Rainbow paper and uncover new and interesting phenomena. Evidently it is much easier to revisit something than to discover it in the first place. Our intent with this work, however, was to argue for the relevance and significance of empirical research on small- and medium-scale environments. We believe that these less computationally intensive environments lend themselves well to a more critical and thorough analysis of the performance, behaviors, and intricacies of new algorithms.

We are by no means calling for less emphasis to be placed on large-scale benchmarks. We are simply urging researchers to consider smaller-scale environments as a valuable tool in their investigations, and reviewers to avoid dismissing empirical work that focuses on smaller-scale environments. By doing so, in addition to reducing the environmental impact of our experiments, we will get both a clearer picture of the research landscape and reduce the barriers for researchers from diverse and often underresourced communities, which can only help make our community and scientific advances stronger.

Acknowledgments
Thank you to Johan, the first author of this paper, for his hard work and persistence in seeing this through! We would also like to thank Marlos C. Machado, Sara Hooker, Matthieu Geist, Nino Vieillard, Hado van Hasselt, Eleni Triantafillou, and Brian Tanner for their insightful comments on this work.

Read More

Launching the AI Academy for small newsrooms

As people searched for the latest information on COVID-19 last year, including school reopenings and travel restrictions, the BBC recognized they needed to find new ways of bringing their journalism to their audiences. They released a new online tool, the BBC Corona Bot, which uses artificial intelligence to draw on BBC News’ explanatory journalism. It responds with an answer to a reader’s specific question where possible, or points to health authorities’ websites when appropriate. AI technology allowed BBC News to reach new audiences and drive more traffic to their stories and explainers. 

This is one example of how AI can help newsrooms. AI can help build new audiences and automate tasks, freeing up time for journalists to work on the more creative aspects of news production and leaving tedious and repetitive tasks to machines. However, newsrooms around the world have told researchers they worry that access to AI technology is unequal. They fear big publishers likely will benefit most from artificial intelligence, while smaller news organizations could get left behind. 

To help bridge this gap, the Google News Initiative is partnering with Polis, the London School of Economics and Political Science’s journalism think tank, to launch a training academy for 20 media professionals to learn how AI can be used to support their journalism. 

The AI Academy for Small Newsrooms is a six-week long, free online program taught by industry-leading journalists and researchers who work at the intersection of journalism and AI. It will start in September 2021 and will welcome journalists and developers from small news organizations in the Europe, Middle East, and Africa (EMEA) region.

By the end of the course, participants will have a practical understanding of the challenges and opportunities of AI technologies. They will learn examples of how to use AI to automate repetitive tasks, such as interview transcription or image search, as well as how to optimize newsroom processes by getting insights on what content is most engaging.

For example, other newsrooms using AI technology in the region include Schibsted, a Nordic news outlet that developed an innovative model to reduce gender bias in news coverage, while in Spain, El Pais uses an AI-based tool to moderate toxic comments.

Most importantly, participants will create action plans to guide the development of AI projects within their news organizations. JournalismAI will share these plans openly to help other publishers around the world.

This pilot program — which we plan to launch in other regions in 2022 — is part of a broader training effort over the last three years by JournalismAI, a partnership between the GNI and Polis to forester AI literacy in newsrooms globally. More than 110,000 participants have already taken the online training modules available on the Google News Initiative Training Center.

This year, JournalismAI will also create an AI Journalism Starter Pack to make the information about AI in journalism more accessible to small and local publishers. It will include examples of AI tools that can solve small and local publishers’ basic needs such as tagging or transcribing.

Find more detailed information on the AI Academy for Small Newsrooms and how to apply on the JournalismAI website. The deadline for applications is 11:59 PM GMT on August 1, 2021.

Read More

Kick like a pro with Footy Skills Lab

When I was growing up in Brisbane, Aussie Rules football wasn’t offered as a school sport – and there weren’t any professional female role models to look up to and learn from. Despite these limitations, we got resourceful. We organized football games in our lunch breaks with friends, using soccer or rugby goal posts and adding sticks or cones to serve as point posts. We practised accuracy using rubbish bins as targets.

A decade later, women have truly made their mark in the AFL. There are, however,  many barriers still facing aspiring footy players — including access, cost, mobility and, more recently, lockdown restrictions. We still have to be resourceful to keep active and hone our skills. 

Three years ago, the AFL and Google first teamed up to help footy fans better connect with the games and players they love. Since then, we’ve been thinking about ways we could improve access to Aussie Rules coaching and community participation – regardless of ability, gender, location, culture or socio-economic background. 

A graphic showing a phone with the Footy Skills Lab app open, in front of a Sherrin football.

Today, we’re thrilled to launch Footy Skills Lab to help budding footy players in Australia and all around the world sharpen their AFL skills – straight from their smartphone. 

Footy Skills Lab is a free platform, powered by GoogleAI, which helps you improve your skills through activities in ball-handling, decision-making and kicking across three levels of difficulty.

A screenshot showing the AFL activities available on the app, including ball handling, decision making and kicking.

To give Footy Skills Lab a whirl, all you need is a smartphone with an internet connection, a football, something to prop your phone up (like a water bottle) and space to move. 

You’ll get tips on kicking from me, and tips on ball-handling and decision-making from athletes across the AFLW and AFL Wheelchair competitions – including Carlton’s Madison Prespakis and Richmond’s Akec Makur Chuot. Through audio prompts and closed captioning, these tips and activities are also accessible for people with visual and hearing needs. And when you finish the activity, you’ll get a scorecard that you can share with your friends, family, teammates and coaches. 

Screenshots showing still images of AFL athletes Madison Prespakis and Akec Makur Chuot providing football training tips.

Whether you’re in Manchester, Miami or in lockdown in Melbourne, Footy Skills lab is such an easy, convenient way to get motivated and access coaching from pros.  So go on, join in the fun and give us your best kick!

Read More

Ask a Techspert: What’s a neural network?

Back in the day, there was a surefire way to tell humans and computers apart: You’d present a picture of a four-legged friend and ask if it was a cat or dog. A computer couldn’t identify felines from canines, but we humans could answer with doggone confidence. 

That all changed about a decade ago thanks to leaps in computer vision and machine learning – specifically,  major advancements in neural networks, which can train computers to learn in a way similar to humans. Today, if you give a computer enough images of cats and dogs and label which is which, it can learn to tell them apart purr-fectly. 

But how exactly do neural networks help computers do this? And what else can — or can’t — they do? To answer these questions and more, I sat down with Google Research’s Maithra Raghu, a research scientist who spends her days helping computer scientists better understand neural networks. Her research helped the Google Health team discover new ways to apply deep learning to assist doctors and their patients.

So, the big question: What’s a neural network?

To understand neural networks, we need to first go back to the basics and understand how they fit into the bigger picture of artificial intelligence (AI). Imagine a Russian nesting doll, Maithra explains. AI would be the largest doll, then within that, there’s machine learning (ML), and within that, neural networks (… and within that, deep neural networks, but we’ll get there soon!).

If you think of AI as the science of making things smart, ML is the subfield of AI focused on making computers smarter by teaching them to learn, instead of hard-coding them. Within that, neural networks are an advanced technique for ML, where you teach computers to learn with algorithms that take inspiration from the human brain.

Your brain fires off groups of neurons that communicate with each other. In an artificial neural network, (the computer type), a “neuron” (which you can think of as a computational unit) is grouped with a bunch of other “neurons” into a layer, and those layers  stack on top of each other. Between each of those layers are connections. The more layers  a neural network has, the “deeper” it is. That’s where the idea of “deep learning” comes from. “Neural networks depart from neuroscience because you have a mathematical element to it,” Maithra explains, “Connections between neurons are numerical values represented by matrices, and training the neural network uses gradient-based algorithms.” 

This might seem complex, but you probably interact with neural networks fairly often — like when you’re scrolling through personalized movie recommendations or chatting with a customer service bot.

So once you’ve set up a neural network, is it ready to go?

Not quite. The next step is training. That’s where the model becomes much more sophisticated. Similar to people, neural networks learn from feedback. If you go back to the cat and dog example, your neural network would look at pictures and start by randomly guessing. You’d label the training data (for example, telling the computer if each picture features a cat or dog), and those labels would provide feedback, telling the neural network when it’s right or wrong. Throughout this process, the neural network’s parameters adjust, and the neural network transitions from not knowing to learning how to identify between cats and dogs.

Why don’t we use neural networks all the time?

“Though neural networks are based on our brains, the way they learn is actually very different from humans,” Maithra says. “Neural networks are usually quite specialized and narrow. This can be useful because, for example, it means a neural network might be able to process medical scans much quicker than a doctor, or spot patterns  a trained expert might not even notice.” 

But because neural networks learn differently from people, there’s still a lot that computer scientists don’t know about how they work. Let’s go back to cats versus dogs: If your neural network gives you all the right answers, you might think it’s behaving as intended. But Maithra cautions that neural networks can work in mysterious ways.

“Perhaps your neural network isn’t able to identify between cats and dogs at all – maybe it’s only able to identify between sofas and grass, and all of your pictures of cats happen to be on couches, and all your pictures of dogs are in parks,” she says. “Then, it might seem like it knows the difference when it actually doesn’t.”

That’s why Maithra and other researchers are diving into the internals of neural networks, going deep into their layers and connections, to better understand them – and come up with ways to make them more helpful.

“Neural networks have been transformative for so many industries,” Maithra says, “and I’m excited that we’re going to realize even more profound applications for them moving forward.”

Read More

An update on our progress in responsible AI innovation

Over the past year, responsibly developed AI has transformed health screenings, supportedfact-checking to battle misinformation and save lives, predicted Covid-19 cases to support public health, and protected wildlife after bushfires. Developing AI in a way that gets it right for everyone requires openness, transparency, and a clear focus on understanding the societal implications. That is why we were among the first companies to develop and publish AI Principles and why, each year, we share updates on our progress.

Building on previous AI Principles updates in 20182019, and 2020, today we’re providing our latest overview of what we’ve learned, and how we’re applying these learnings.

Internal Education

In the last year, to ensure our teams have clarity from day one, we’ve added an introduction to our AI Principles for engineers and incoming hires in technical roles. The course presents each of the Principles as well as the applications we will not pursue.

Integrating our Principles into the work we do with enterprise customers is key, so we’ve continued to make our AI Principles in Practice training mandatory for customer-facing Cloud employees. A version of this training is available to all Googlers.

There is no single way to apply the AI Principles to specific features and product development. Training must consider not only the technology and data, but also where and how AI is used. To offer a more comprehensive approach to implementing the AI Principles, we’ve been developing opportunities for Googlers to share their points of view on the responsible development of future technologies, such as the AI Principles Ethics Fellowship for Google’s Employee Resource Groups. Fellows receive AI Principles training and craft hypothetical case studies to inform how Google prioritizes socially beneficial applications. This inaugural year, 27 fellows selected from 191 applicants from around the world wrote and presented case studies on topics such as genome datasets and a Covid-19 content moderation workflow.

Other programs include a bi-weekly Responsible AI Tech Talk Series featuring external experts, such as the Brookings Institution’s Dr. Nicol Turner Lee presenting on detecting and mitigating algorithmic bias.

Tools and Research

To bring together multiple teams working on technical tools and research, this year we formed the Responsible AI and Human-Centered Technology organization. The basic and applied researchers in the organization are devoted to developing technology and best practices for the technical realization of the AI Principles guidance.

As discussed in our December 2020 End-of-Year report, we regularly release these tools to the public. Currently, researchers are developing Know Your Data (in beta) to help developers understand datasets with the goal of improving data quality, helping to mitigate fairness and bias issues.

Image of Know Your Data, a Responsible AI tool in beta

Know Your Data, a Responsible AI tool in beta

Product teams use these tools to evaluate their work’s alignment with the AI Principles. For example, the Portrait Light feature available in both Pixel’s Camera and Google Photos uses multiple machine learning components to instantly add realistic synthetic lighting to portraits. Using computational methods to achieve this effect, however, raised several responsible innovation challenges, including potentially reinforcing unfair bias (AI Principle #2) despite the goal of building a feature that works for all users. So the Portrait Light team generated a training dataset containing millions of photos based on an original set of photos of different people in a diversity of lighting environments, with their explicit consent. The engineering team used various Google Responsible AI tools to test proactively whether the ML models used in Portrait Light performed equitably across different audiences.

Our ongoing technical research related to responsible innovation in AI in the last 12 months has led to more than 200 research papers and articles that address AI Principles-related topics. These include exploring and mitigating data cascades; creating the first model-agnostic framework for partially local federated learning suitable for training and inference at scale; and analyzing the energy- and carbon-costs of training six recent ML models to reduce the carbon footprint of training an ML system by up to 99.9%.

Operationalizing the Principles

To help our teams put the AI Principles into practice, we deploy our decision-making process for reviewing new custom AI projects in development — from chatbots to newer fields such as affective technologies. These reviews are led by a multidisciplinary Responsible Innovation team, which draws on expertise in disciplines including trust and safety, human rights, public policy, law, sustainability, and product management. The team also advises product areas, on specific issues related to serving enterprise customers. Any Googler, at any level, is encouraged to apply for an AI Principles review of a project or planned product or service.

Teams can also request other Responsible Innovation services, such as informal consultations or product fairness testing with the Product Fairness (ProFair) team. ProFair tests products from the user perspective to investigate known issues and find new ones, similar to how an academic researcher would go about identifying fairness issues.

Our Google Cloud, Image Search, Next Billion Users and YouTube product teams have engaged with ProFair to test potential new projects for fairness. Consultations include collaborative testing scenarios, focus groups, and adversarial testing of ML models to address improving equity in data labels, fairness in language models, and bias in predictive text, among other issues. Recently, the ProFair team spent nine months consulting with Google researchers on creating an object recognition dataset for physical landmarks (such as buildings), developing criteria for how to choose which classes to recognize and how to determine the amount of training data per class in ways that would assign a fairer relevance score to each class.

Reviewers weigh the nature of the social benefit involved and whether it substantially exceeds potential challenges. For example, in the past year, reviewers decided not to publicly release ML models that can create photo-realistic synthetic faces made with generative adversarial networks (GANs), because of the risk of potential misuse by malicious actors to create “deepfakes” for misinformation purposes.

As another example, a research team requested a review of a new ML training dataset for computer vision fairness techniques that offered more specific attributes about people, such as perceived gender and age-range. The team worked with Open Images, an open data project containing ~9 million images spanning thousands of object categories and bounding box annotations for 600 classes. Reviewers weighed the risk of labeling the data with the sensitive labels of perceived gender presentation and age-range, and the societal benefit of these labels for fairness analysis and bias mitigation. Given these risks, reviewers required creation of a data card explaining the details and limitations of the project. We released the MIAP (More Inclusive Annotations for People) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy in 100K images containing people. Each annotation is also labeled with fairness-related attributes. MIAP was accepted and presented at the 2021 Artificial Intelligence, Ethics and Society conference.

External Engagement

We remain committed to contributing to emerging international principles for responsible AI innovation. For example, we submitted a response to the European Commission consultation on the inception impact assessment on ethical and legal requirements for AI and feedback on NITI Aayog’s working document on Responsible Use of AI to guide national principles for India. We also supported the Singaporean government’s Guide to Job Redesign in the Age of AI, which outlines opportunities to optimize AI benefits by redesigning jobs and reskilling employees.

Our efforts to engage global audiences, from students to policymakers, center on educational programs and discussions to offer helpful and practical ML education, including:

  • A workshop on Federated Learning and Analytics, making all research talks and a TensorFlow Federated tutorial publicly available.
  • Machine Learning for Policy Leaders (ML4PL), a 2-hour virtual workshop on the basics of ML. To date, we’ve expanded this globally, reaching more than 350 policymakers across the EU, LatAm, APAC, and US.
  • A workshop co-hosted with the Indian Ministry of Electronics and Information Technology on the Responsible Use of AI for Social Empowerment, exploring the potential of AI adoption in the government to address COVID-19, agricultural and environmental crises.

To support these workshops and events with actionable and equitable programming designed for long-term collaboration, over the past year we’ve helped launch:

  • AI for Social Good workshops, bringing together NGOs applying AI to tough challenges in their communities with academic experts and Google researchers to encourage collaborative AI solutions. So far we’ve supported more than 30 projects in Asia Pacific and Sub-Saharan Africa with expertise, funding and Cloud Credits.
  • Two collaborations with the U.S. National Science Foundation: one to support the National AI Research Institute for Human-AI Interaction and Collaboration with $5 million in funding, along with AI expertise, research collaborations and Cloud support; another to join other industry partners and federal agencies as part of a combined $40 million investment in academic research for Resilient and Intelligent Next-Generation (NextG) Systems, in which Google will offer expertise, research collaborations, infrastructure and in-kind support to researchers.
  • Quarterly Equitable AI Research Roundtables (EARR), focused on the potential downstream harms of AI with experts from the Othering and Belonging Institute at UC Berkeley, PolicyLink, and Emory University School of Law.
  • MLCommons, a non-profit, which will administer MLPerf, a suite of benchmarks for Google and industry peers.

Committed to sharing tools and discoveries

In the three years since Google released our AI Principles, we’ve worked to integrate this ethical charter across our work — from the development of advanced technologies to business processes. As we learn and engage with people and organizations across society we’re committed to sharing tools, processes and discoveries with the global community. You’ll find many of these in our recently updated People + AI Research Guidebook and on the Google AI responsibilities site, which we update quarterly with case studies and other resources. 

Read More

Quickly Training Game-Playing Agents with Machine Learning

Posted by Leopold Haller and Hernan Moraldo, Software Engineers, Google Research

In the last two decades, dramatic advances in compute and connectivity have allowed game developers to create works of ever-increasing scope and complexity. Simple linear levels have evolved into photorealistic open worlds, procedural algorithms have enabled games with unprecedented variety, and expanding internet access has transformed games into dynamic online services. Unfortunately, scope and complexity have grown more rapidly than the size of quality assurance teams or the capabilities of traditional automated testing. This poses a challenge to both product quality (such as delayed releases and post-launch patches) and developer quality of life.

Machine learning (ML) techniques offer a possible solution, as they have demonstrated the potential to profoundly impact game development flows — they can help designers balance their game and empower artists to produce high-quality assets in a fraction of the time traditionally required. Furthermore, they can be used to train challenging opponents that can compete at the highest levels of play. Yet some ML techniques can pose requirements that currently make them impractical for production game teams, including the design of game-specific network architectures, the development of expertise in implementing ML algorithms, or the generation of billions of frames of training data. Conversely, game developers operate in a setting that offers unique advantages to leverage ML techniques, such as direct access to the game source, an abundance of expert demonstrations, and the uniquely interactive nature of video games.

Today, we present an ML-based system that game developers can use to quickly and efficiently train game-testing agents, helping developers find serious bugs quickly, while allowing human testers to focus on more complex and intricate problems. The resulting solution requires no ML expertise, works on many of the most popular game genres, and can train an ML policy, which generates game actions from the game state, in less than an hour on a single game instance. We have also released an open source library that demonstrates a functional application of these techniques.

Supported genres include arcade, action/adventure, and racing games.

The Right Tool for the Right Job
The most elemental form of video game testing is to simply play the game. A lot. Many of the most serious bugs (such as crashes or falling out of the world) are easy to detect and fix; the challenge is finding them within the vast state space of a modern game. As such, we decided to focus on training a system that could “just play the game” at scale.

We found that the most effective way to do this was not to try to train a single, super-effective agent that could play the entire game from end-to-end, but to provide developers with the ability to train an ensemble of game-testing agents, each of which could effectively accomplish tasks of a few minutes each, which game developers refer to as “gameplay loops”.

These core gameplay behaviors are often expensive to program through traditional means, but are much more efficient to train than a single end-to-end ML model. In practice, commercial games create longer loops by repeating and remixing core gameplay loops, which means that developers can test large stretches of gameplay by combining ML policies with a small amount of simple scripting.

Simulation-centric, Semantic API
One of the most fundamental challenges in applying ML to game development is bridging the chasm between the simulation-centric world of video games and the data-centric world of ML. Rather than ask developers to directly convert the game state into custom, low-level ML features (which would be too labor intensive) or attempting to learn from raw pixels (which would require too much data to train), our system provides developers with an idiomatic, game-developer friendly API that allows them to describe their game in terms of the essential state that a player observes and the semantic actions they can perform. All of this information is expressed via concepts that are familiar to game developers, such as entities, raycasts, 3D positions and rotations, buttons and joysticks.

As you can see in the example below, the API allows the specification of observations and actions in just a few lines of code.

Example actions and observations for a racing game.

From API to Neural Network
This high level, semantic API is not just easy to use, but also allows the system to flexibly adapt to the specific game being developed — the specific combination of API building blocks employed by the game developer informs our choice of network architecture, since it provides information about the type of gaming scenario in which the system is deployed. Some examples of this include: handling action outputs differently depending on whether they represent a digital button or analog joystick, or using techniques from image processing to handle observations that result from an agent probing its environment with raycasts (similar to how autonomous vehicles probe their environment with LIDAR).

Our API is sufficiently general to allow modeling of many common control-schemes (the configuration of action outputs that control movement) in games, such as first-person games, third-person games with camera-relative controls, racing games, twin stick shooters, etc. Since 3D movement and aiming are often an integral aspect of gameplay in general, we create networks that automatically tend towards simple behaviors such as aiming, approach or avoidance in these games. The system accomplishes this by analyzing the game’s control scheme to create neural network layers that perform custom processing of observations and actions in that game. For example, positions and rotations of objects in the world are automatically translated into directions and distances from the point of view of the AI-controlled game entity. This transformation typically increases the speed of learning and helps the learned network generalize better.

An example neural network generated for a game with joystick controls and raycast inputs. Depending on the inputs (red) and the control scheme, the system generates custom pre- and post-processing layers (orange).

Learning From The Experts in Real Time
After generating a neural network architecture, the network needs to be trained to play the game using an appropriate choice of learning algorithm.

Reinforcement learning (RL), in which an ML policy is trained directly to maximize a reward, may seem like the obvious choice since they have been successfully used to train highly competent ML policies for games. However, RL algorithms tend to require more data than a single game instance can produce in a reasonable amount of time, and achieving good results in a new domain often requires hyperparameter tuning and strong ML domain knowledge.

Instead, we found that imitation learning (IL), which trains ML policies based by observing experts play the game, works well for our use case. Unlike RL, where the agent needs to discover a good policy on its own, IL only needs to recreate the behavior of a human expert. Since game developers and testers are experts in their own games, they can easily provide demonstrations of how to play the game.

We use an IL approach inspired by the DAgger algorithm, which allows us to take advantage of video games’ most compelling quality — interactivity. Thanks to the reductions in training time and data requirements enabled by our semantic API, training is effectively realtime, giving a developer the ability to fluidly switch between providing gameplay demonstrations and watching the system play. This results in a natural feedback loop, in which a developer iteratively provides corrections to a continuous stream of ML policies.

From the developer’s perspective, providing a demonstration or a correction to faulty behavior is as simple as picking up the controller and starting to play the game. Once they are done, they can put the controller down and watch the ML policy play. The result is a training experience that is real-time, interactive, highly experiential, and, very often, more than a little fun.

ML policy for an FPS game, trained with our system.

Conclusion
We present a system that combines a high-level semantic API with a DAgger-inspired interactive training flow that enables training of useful ML policies for video game testing in a wide variety of genres. We have released an open source library as a functional illustration of our system. No ML expertise is required and training of agents for test applications often takes less than an hour on a single developer machine. We hope that this work will help inspire the development of ML techniques that can be deployed in real-world game-development flows in ways that are accessible, effective, and fun to use.

Acknowledgements
We’d like to thank the core members of the project: Dexter Allen, Leopold Haller, Nathan Martz, Hernan Moraldo, Stewart Miles and Hina Sakazaki. Training algorithms are provided by TF Agents, and on-device inference by TF Lite. Special thanks to our research advisors, Olivier Bachem, Erik Frey, and Toby Pohlen, and to Eugene Brevdo, Jared Duke, Oscar Ramirez and Neal Wu who provided helpful guidance and support.

Read More

Douglas Coupland fuses AI and art to inspire students

Have you ever noticed that the word art is embedded in the phrase artificial intelligence? Neither did we, but when the opportunity presented itself to explore how artificial intelligence (AI) inspires artistic expression — with the help of internationally renowned Canadian artist Douglas Coupland — the Google Research team jumped on it. This collaboration, with the support of Google Arts & Culture, culminated in a project called Slogans for the Class of 2030, which spotlights the experiences of the first generation of young people whose lives are fully intertwined with the existence of AI. 

This collaboration was brought to life by first introducing Coupland’s written work to a machine learning language model. Machine learning is a form of AI that provides computer systems the ability to automatically learn from data. In this case, Google research scientists tuned a machine learning algorithm with Coupland’s 30-year body of written work — more than a million words — so it would familiarize itself with the author’s unique style of writing. From there, curated general-public social media posts on selected topics were added to teach the algorithm how to craft short-form, topical statements.

Once the algorithm was trained, the next step was to process and reassemble suggestions of text for Coupland to use as inspiration to create twenty-five Slogans for the Class of 2030

“I would comb through ‘data dumps’ where characters from one novel were speaking with those in other novels in ways that they might actually do. It felt like I was encountering a parallel universe Doug,” Coupland says. “And from these outputs, the statements you see here in this project appeared like gems. Did I write them? Yes. No. Could they have existed without me? No.”

A common theme in Coupland’s work is the investigation of the human condition through the lens of pop culture. The focus on the class of 2030 was intentional. Coupland wanted to create works that would serve as inspiration for students in their early teens who will be graduating from universities in 2030. For those teens considering their future career paths, he hoped that this collaboration would trigger a broader conversation on the vast possibilities in the field and would acquaint them with the fact that AI does not have to be strictly scientific, it can be artful.

Unveiled today, all 25 thought-provoking and visually rich digital slogans are yours to experience on Google Arts & Culture alongside Coupland’s artistic statement and other behind-the-scenes material. This isn’t Douglas’ first collaboration with Google Arts & Culture; in 2019 Coupland’s Vancouver art exhibition was captured virtually. In 2015 and 2016, he joined theGoogle Arts & Culture Lab residency in Paris where he collaborated with engineers to develop many works including theSearch book andthe Living Internet

In an effort to celebrate local talent and culture, multiple venues across Canada have signed on to project the slogans, for a limited time, on larger-than-life digital screens allowing curious minds to experience them in an immersive way. The screens include the Terry Fox Memorial Plaza at BC Place Stadium in Vancouver B. C., the TELUS Len Werry building in downtown Calgary, TELUS Harbour in downtown Toronto and select Pattison digital screens across Canada.     

Technology has always played a role in creating new types of possibilities that inspire artists — from the sounds of distortion to the electronic sounds of synths for musicians for example. Today, advances in AI are opening up new possibilities for other forms of art and we look forward to seeing what the crossroads of art and technology bring to life. 

Read More

How we’re supporting 30 new AI for Social Good projects

Over recent years, we have seen remarkable progress in AI’s ability to confront new problems and help solve old ones. Advancing these efforts was one reason we set up the Google Research India lab in 2019, with a particular emphasis on AI research that could make a positive social impact. It’s also why we’ve supported nonprofit organizations through the Google AI Impact Challenge.

Working in partnership with Google.org and Google’s University Relations program, our goal is to help academics and nonprofits develop AI techniques that can improve people’s lives — especially in underserved communities that haven’t yet benefited from advances in AI. We reported on the impact of six such projects in 2020. And today, we’re sharing 30 new projects that will receive funding and support as part of our AI for Social Good program

During the application process, Googlers arranged workshops involving more than 150 teams to discuss potential projects. Following the workshop meetings, project teams made up of NGOs and academics submitted proposals which Google experts reviewed. The result is a promising range of projects spanning seventeen countries across Asia-Pacific and Sub-Saharan Africa — including India, Uganda, Nigeria, Japan and Australia— focused on agriculture, conservation and public health. 

In agriculture, this includes research to help farmer collectives with market intelligence and use data to improve crop and irrigation planning for smallholder farmers. In public health, we are backing projects that will enable targeted public health interventions, and will help community health workers to forecast health risks in countries such as Kenya, India and Uganda. We’re also supporting research to better forecast the need for critical resources like vaccines and care, including in Nigeria. And in conservation, we’re supporting research to help understand animal population changes, such as the effect of poaching on elephants, and gorillas. Other projects will help reduce conservation conflict and poaching, including human-elephant conflict in Kenya.

Each project team will receive funding, technical contributions from Google and access to computational resources. Academics in this program will be recognized as “Impact Scholars” for their contributions towards advancing research for social good.  

We’ve seen the impact these kinds of projects can make. One of the nonprofit leaders supported by the program last year, ARMMAN founder Dr. Aparna Hegde, has received AI research support from IIT Madras and Google Research to improve maternal and child health outcomes in India. The team is building a predictive model to prevent expectant mothers dropping out of supportive telehealth outreach programs. Results so far show AI could enable ARMMAN to increase the number of women engaged through the program by 50%, and they have received a second Google.org grant to enable them to build on this progress. Dr. Hegde says the program is “already showing encouraging results — and I am confident that this partnership will bring immense benefits in the future.”

Congratulations to all the recipients of this round’s support. We’re looking forward to continuing to nurture the AI for Social Good community, bringing together experts from diverse backgrounds with the common goal of advancing AI to improve lives around the world.

Read More

Take All Your Pictures to the Cleaners, with Google Photos Noise and Blur Reduction

Posted by Mauricio Delbracio, Research Scientist and Sungjoon Choi, Software Engineer, Google Research

Despite recent leaps in imaging technology, especially on mobile devices, image noise and limited sharpness remain two of the most important levers for improving the visual quality of a photograph. These are particularly relevant when taking pictures in poor light conditions, where cameras may compensate by increasing the ISO or slowing the shutter speed, thereby exacerbating the presence of noise and, at times, increasing image blur. Noise can be associated with the particle nature of light (shot noise) or be introduced by electronic components during the readout process (read noise). The captured noisy signal is then processed by the camera image processor (ISP) and later may be further enhanced, amplified, or distorted by a photographic editing process. Image blur can be caused by a wide variety of phenomena, from inadvertent camera shake during capture, an incorrect setting of the camera’s focus (automatic or not), or due to the finite lens aperture, sensor resolution or the camera’s image processing.

It is far easier to minimize the effects of noise and blur within a camera pipeline, where details of the sensor, optical hardware and software blocks are understood. However, when presented with an image produced from an arbitrary (possibly unknown) camera, improving noise and sharpness becomes much more challenging due to the lack of detailed knowledge and access to the internal parameters of the camera. In most situations, these two problems are intrinsically related: noise reduction tends to eliminate fine structures along with unwanted details, while blur reduction seeks to boost structures and fine details. This interconnectedness increases the difficulty of developing image enhancement techniques that are computationally efficient to run on mobile devices.

Today, we present a new approach for camera-agnostic estimation and elimination of noise and blur that can improve the quality of most images. We developed a pull-push denoising algorithm that is paired with a deblurring method, called polyblur. Both of these components are designed to maximize computational efficiency, so users can successfully enhance the quality of a multi-megapixel image in milliseconds on a mobile device. These noise and blur reduction strategies are critical components of the recent Google Photos editor updates, which includes “Denoise” and “Sharpen” tools that enable users to enhance images that may have been captured under less than ideal conditions, or with older devices that may have had more noisy sensors or less sharp optics.

A demonstration of the “Denoise” and “Sharpen” tools now available in the Google Photos editor.

How Noisy is An Image?
In order to accurately process a photographic image and successfully reduce the unwanted effects of noise and blur, it is vitally important to first characterize the types and levels of noise and blur found in the image. So, a camera-agnostic approach for noise reduction begins by formulating a method to gauge the strength of noise at the pixel level from any given image, regardless of the device that created it. The noise level is modeled as a function of the brightness of the underlying pixel. That is, for each possible brightness level, the model estimates a corresponding noise level in a manner agnostic to either the actual source of the noise or the processing pipeline.

To estimate this brightness-based noise level, we sample a number of small patches across the image and measure the noise level within each patch, after roughly removing any underlying structure in the image. This process is repeated at multiple scales, making it robust to artifacts that may arise from compression, image resizing, or other non-linear camera processing operations.

The two segments on the left illustrate signal-dependent noise present in the input image (center). The noise is more prominent in the bottom, darker crop and is unrelated to the underlying structure, but rather to the light level. Such image segments are sampled and processed to generate the spatially-varying noise map (right) where red indicates more noise is present.

Reducing Noise Selectively with a Pull-Push Method
We take advantage of self-similarity of patches across the image to denoise with high fidelity. The general principle behind such so-called “non-local” denoising is that noisy pixels can be denoised by averaging pixels with similar local structure. However, these approaches typically incur high computational costs because they require a brute force search for pixels with similar local structure, making them impractical for on-device use. In our “pull-push” approach1, the algorithmic complexity is decoupled from the size of filter footprints thanks to effective information propagation across spatial scales.

The first step in pull-push is to build an image pyramid (i.e., multiscale representation) in which each successive level is generated recursively by a “pull” filter (analogous to downsampling). This filter uses a per-pixel weighting scheme to selectively combine existing noisy pixels together based on their patch similarities and estimated noise, thus reducing the noise at each successive, “coarser” level. Pixels at coarser levels (i.e., with lower resolution) pull and aggregate only compatible pixels from higher resolution, “finer” levels. In addition to this, each merged pixel in the coarser layers also includes an estimated reliability measure computed from the similarity weights used to generate it. Thus, merged pixels provide a simple per-pixel, per-level characterization of the image and its local statistics. By efficiently propagating this information through each level (i.e., each spatial scale), we are able to track a model of the neighborhood statistics for increasingly larger regions in a multiscale manner.

After the pull stage is evaluated to the coarsest level, the “push” stage fuses the results, starting from the coarsest level and generating finer levels iteratively. At a given scale, the push stage generates “filtered” pixels following a process similar to that of the pull stage, but going from coarse to finer levels. The pixels at each level are fused with those of coarser levels by doing a weighted average of same-level pixels along with coarser-level filtered pixels using the respective reliability weights. This enables us to reduce pixel noise while preserving local structure, because only average reliable information is included. This selective filtering and reliability (i.e. information) multiscale propagation is what makes push-pull different from existing frameworks.

This series of images shows how filtering progresses through the pull-push process. Coarser level pixels pull and aggregate only compatible pixels from finer levels, as opposed to the traditional multiscale approaches using a fixed (non-data dependent) kernel. Notice how the noise is reduced throughout the stages.

The pull-push approach has a low computational cost, because the algorithm to selectively filter similar pixels over a very large neighborhood has a complexity that is only linear with the number of image pixels. In practice, the quality of this denoising approach is comparable to traditional non-local methods with much larger kernel footprints, but operates at a fraction of the computational cost.

Image enhanced using the pull-push denoising method.

How Blurry Is an Image?
An image with poor sharpness can be thought of as being a more pristine latent image that was operated on by a blur kernel. So, if one can identify the blur kernel, it can be used to reduce the effect. This is referred to as “deblurring”, i.e., the removal or reduction of an undesired blur effect induced by a particular kernel on a particular image. In contrast, “sharpening” refers to applying a sharpening filter, built from scratch and without reference to any particular image or blur kernel. Typical sharpening filters are also, in general, local operations that do not take account of any other information from other parts of the image, whereas deblurring algorithms estimate the blur from the whole image. Unlike arbitrary sharpening, which can result in worse image quality when applied to an image that is already sharp, deblurring a sharp image with a blur kernel accurately estimated from the image itself will have very little effect.

We specifically target relatively mild blur, as this scenario is more technically tractable, more computationally efficient, and produces consistent results. We model the blur kernel as an anisotropic (elliptical) Gaussian kernel, specified by three parameters that control the strength, direction and aspect ratio of the blur.

Gaussian blur model and example blur kernels. Each row of the plot on the right represents possible combinations of σ0, ρ and θ. We show three different σ0 values with three different ρ values for each.

Computing and removing blur without noticeable delay for the user requires an algorithm that is much more computationally efficient than existing approaches, which typically cannot be executed on a mobile device. We rely on an intriguing empirical observation: the maximal value of the image gradient across all directions at any point in a sharp image follows a particular distribution. Finding the maximum gradient value is efficient, and can yield a reliable estimate of the strength of the blur in the given direction. With this information in hand, we can directly recover the parameters that characterize the blur.

Polyblur: Removing Blur by Re-blurring
To recover the sharp image given the estimated blur, we would (in theory) need to solve a numerically unstable inverse problem (i.e., deblurring). The inversion problem grows exponentially more unstable with the strength of the blur. As such, we target the case of mild blur removal. That is, we assume that the image at hand is not so blurry as to be beyond practical repair. This enables a more practical approach — by carefully combining different re-applications of an operator we can approximate its inverse.

Mild blur, as shown in these examples, can be effectively removed by combining multiple applications of the estimated blur.

This means, rather counterintuitively, that we can deblur an image by re-blurring it several times with the estimated blur kernel. Each application of the (estimated) blur corresponds to a first order polynomial, and the repeated applications (adding or subtracting) correspond to higher order terms in a polynomial. A key aspect of this approach, which we call polyblur, is that it is very fast, because it only requires a few applications of the blur itself. This allows it to operate on megapixel images in a fraction of a second on a typical mobile device. The degree of the polynomial and its coefficients are set to invert the blur without boosting noise and other unwanted artifacts.

The deblurred image is generated by adding and subtracting multiple re-applications of the estimated blur (polyblur).

Integration with Google Photos
The innovations described here have been integrated and made available to users in the Google Photos image editor in two new adjustment sliders called “Denoise” and “Sharpen”. These features allow users to improve the quality of everyday images, from any capture device. The features often complement each other, allowing both denoising to reduce unwanted artifacts, and sharpening to bring clarity to the image subjects. Try using this pair of tools in tandem in your images for best results. To learn more about the details of the work described here, check out our papers on polyblur and pull-push denoising. To see some examples of the effect of our denoising and sharpening up close, have a look at the images in this album.

Acknowledgements
The authors gratefully acknowledge the contributions of Ignacio Garcia-Dorado, Ryan Campbell, Damien Kelly, Peyman Milanfar, and John Isidoro. We are also thankful for support and feedback from Navin Sarma, Zachary Senzer, Brandon Ruffin, and Michael Milne.


1 The original pull-push algorithm was developed as an efficient scattered data interpolation method to estimate and fill in the missing pixels in an image where only a subset of the pixels are specified. Here, we extend its methodology and present a data-dependent multiscale algorithm for denoising images efficiently. 

Read More

Achieving Precision in Quantum Material Simulations

Posted by Charles Neill and Zhang Jiang, Senior Research Scientists, Google Quantum AI

In fall of 2019, we demonstrated that the Sycamore quantum processor could outperform the most powerful classical computers when applied to a tailor-made problem. The next challenge is to extend this result to solve practical problems in materials science, chemistry and physics. But going beyond the capabilities of classical computers for these problems is challenging and will require new insights to achieve state-of-the-art accuracy. Generally, the difficulty in performing quantum simulations of such physical problems is rooted in the wave nature of quantum particles, where deviations in the initial setup, interference from the environment, or small errors in the calculations can lead to large deviations in the computational result.

In two upcoming publications, we outline a blueprint for achieving record levels of precision for the task of simulating quantum materials. In the first work, we consider one-dimensional systems, like thin wires, and demonstrate how to accurately compute electronic properties, such as current and conductance. In the second work, we show how to map the Fermi-Hubbard model, which describes interacting electrons, to a quantum processor in order to simulate important physical properties. These works take a significant step towards realizing our long-term goal of simulating more complex systems with practical applications, like batteries and pharmaceuticals.

A bottom view of one of the quantum dilution refrigerators during maintenance. During the operation, the microwave wires that are floating in this image are connected to the quantum processor, e.g., the Sycamore chip, bringing the temperature of the lowest stage to a few tens of milli-degrees above absolute zero temperature.

Computing Electronic Properties of Quantum Materials
In “Accurately computing electronic properties of a quantum ring”, to be published in Nature, we show how to reconstruct key electronic properties of quantum materials. The focus of this work is on one-dimensional conductors, which we simulate by forming a loop out of 18 qubits on the Sycamore processor in order to mimic a very narrow wire. We illustrate the underlying physics through a series of simple text-book experiments, starting with a computation of the “band-structure” of this wire, which describes the relationship between the energy and momentum of electrons in the metal. Understanding such structure is a key step in computing electronic properties such as current and conductance. Despite being an 18-qubit algorithm consisting of over 1,400 logical operations, a significant computational task for near-term devices, we are able to achieve a total error as low as 1%.

The key insight enabling this level of accuracy stems from robust properties of the Fourier transform. The quantum signal that we measure oscillates in time with a small number of frequencies. Taking a Fourier transform of this signal reveals peaks at the oscillation frequencies (in this case, the energy of electrons in the wire). While experimental imperfections affect the height of the observed peaks (corresponding to the strength of the oscillation), the center frequencies are robust to these errors. On the other hand, the center frequencies are especially sensitive to the physical properties of the wire that we hope to study (e.g., revealing small disorders in the local electric field felt by the electrons). The essence of our work is that studying quantum signals in the Fourier domain enables robust protection against experimental errors while providing a sensitive probe of the underlying quantum system.

(Left) Schematic of the 54-qubit quantum processor, Sycamore. Qubits are shown as gray crosses and tunable couplers as blue squares. Eighteen of the qubits are isolated to form a ring. (Middle) Fourier transform of the measured quantum signal. Peaks in the Fourier spectrum correspond to the energy of electrons in the ring. Each peak can be associated with a traveling wave that has fixed momentum. (Right) The center frequency of each peak (corresponding to the energy of electrons in the wire) is plotted versus the peak index (corresponding to the momentum). The measured relationship between energy and momentum is referred to as the ‘band structure’ of the quantum wire and provides valuable information about electronic properties of the material, such as current and conductance.

Quantum Simulation of the Fermi-Hubbard Model
In “Observation of separated dynamics of charge and spin in the Fermi-Hubbard model”, we focus on the dynamics of interacting electrons. Interactions between particles give rise to novel phenomena such as high temperature superconductivity and spin-charge separation. The simplest model that captures this behavior is known as the Fermi-Hubbard model. In materials such as metals, the atomic nuclei form a crystalline lattice and electrons hop from lattice site to lattice site carrying electrical current. In order to accurately model these systems, it is necessary to include the repulsion that electrons feel when getting close to one another. The Fermi-Hubbard model captures this physics with two simple parameters that describe the hopping rate (J) and the repulsion strength (U).

We realize the dynamics of this model by mapping the two physical parameters to logical operations on the qubits of the processor. Using these operations, we simulate a state of the electrons where both the electron charge and spin densities are peaked near the center of the qubit array. As the system evolves, the charge and spin densities spread at different rates due to the strong correlations between electrons. Our results provide an intuitive picture of interacting electrons and serve as a benchmark for simulating quantum materials with superconducting qubits.

(Left top) Illustration of the one-dimensional Fermi-Hubbard model in a periodic potential. Electrons are shown in blue, with their spin indicated by the connected arrow. J, the distance between troughs in the electric potential field, reflects the “hopping” rate, i.e., the rate at which electrons transition from one trough in the potential to another, and U, the amplitude, represents the strength of repulsion between electrons. (Left bottom) The simulation of the model on a qubit ladder, where each qubit (square) represents a fermionic state with spin-up or spin-down (arrows). (Right) Time evolution of the model reveals separated spreading rates of charge and spin. Points and solid lines represent experimental and numerical exact results, respectively. At t = 0, the charge and spin densities are peaked at the middle sites. At later times, the charge density spreads and reaches the boundaries faster than the spin density.

Conclusion
Quantum processors hold the promise to solve computationally hard tasks beyond the capability of classical approaches. However, in order for these engineered platforms to be considered as serious contenders, they must offer computational accuracy beyond the current state-of-the-art classical methods. In our first experiment, we demonstrate an unprecedented level of accuracy in simulating simple materials, and in our second experiment, we show how to embed realistic models of interacting electrons into a quantum processor. It is our hope that these experimental results help progress the goal of moving beyond the classical computing horizon.

Read More