Introducing FELIX: Flexible Text Editing Through Tagging and Insertion

Posted by Jonathan Mallinson and Aliaksei Severyn, Research Scientists, Google Research

Sequence-to-sequence (seq2seq) models have become a favoured approach for tackling natural language generation tasks, with applications ranging from machine translation to monolingual generation tasks, such as summarization, sentence fusion, text simplification, and machine translation post-editing. However these models appear to be a suboptimal choice for many monolingual tasks, as the desired output text often represents a minor rewrite of the input text. When accomplishing such tasks, seq2seq models are both slower because they generate the output one word at a time (i.e., autoregressively), and wasteful because most of the input tokens are simply copied into the output.

Instead, text-editing models have recently received a surge of interest as they propose to predict edit operations – such as word deletion, insertion, or replacement – that are applied to the input to reconstruct the output. However, previous text-editing approaches have limitations. They are either fast (being non-autoregressive), but not flexible, because they use a limited number of edit operations, or they are flexible, supporting all possible edit operations, but slow (autoregressive). In either case, they have not focused on modeling large structural (syntactic) transformations, for example switching from active voice, “They ate steak for dinner,” to passive, “Steak was eaten for dinner.” Instead, they’ve focused on local transformations, deleting or replacing short phrases. When a large structural transformation needs to occur, they either can’t produce it or insert a large amount of new text, which is slow.

In “FELIX: Flexible Text Editing Through Tagging and Insertion”, we introduce FELIX, a fast and flexible text-editing system that models large structural changes and achieves a 90x speed-up compared to seq2seq approaches whilst achieving impressive results on four monolingual generation tasks. Compared to traditional seq2seq methods, FELIX has the following three key advantages:

  • Sample efficiency: Training a high precision text generation model typically requires large amounts of high-quality supervised data. FELIX uses three techniques to minimize the amount of required data: (1) fine-tuning pre-trained checkpoints, (2) a tagging model that learns a small number of edit operations, and (3) a text insertion task that is very similar to the pre-training task.
  • Fast inference time: FELIX is fully non-autoregressive, avoiding slow inference times caused by an autoregressive decoder.
  • Flexible text editing: FELIX strikes a balance between the complexity of learned edit operations and flexibility in the transformations it models.

In short, FELIX is designed to derive the maximum benefit from self-supervised pre-training, being efficient in low-resource settings, with little training data.

Overview
To achieve the above, FELIX decomposes the text-editing task into two sub-tasks: tagging to decide on the subset of input words and their order in the output text, and insertion, where words that are not present in the input are inserted. The tagging model employs a novel pointer mechanism, which supports structural transformations, while the insertion model is based on a Masked Language Model. Both of these models are non-autoregressive, ensuring the model is fast. A diagram of FELIX can be seen below.

An example of FELIX trained on data for a text simplification task. Input words are first tagged as KEEP (K), DELETE (D) or KEEP and INSERT (I). After tagging, the input is reordered. This reordered input is then fed to a masked language model.

The Tagging Model
The first step in FELIX is the tagging model, which consists of two components. First the tagger determines which words should be kept or deleted and where new words should be inserted. When the tagger predicts an insertion, a special MASK token is added to the output. After tagging, there is a reordering step where the pointer reorders the input to form the output, by which it is able to reuse parts of the input instead of inserting new text. The reordering step supports arbitrary rewrites, which enables modeling large changes. The pointer network is trained such that each word in the input points to the next word as it will appear in the output, as shown below.

Realization of the pointing mechanism to transform “There are 3 layers in the walls of the heart” into “the heart MASK 3 layers”.

The Insertion Model
The output of the tagging model is the reordered input text with deleted words and MASK tokens predicted by the insertion tag. The insertion model must predict the content of MASK tokens. Because FELIX’s insertion model is very similar to the pretraining objective of BERT, it can take direct advantage of the pre-training, which is particularly advantageous when data is limited.

Example of the insertion model, where the tagger predicts two words will be inserted and the insertion model predicts the content of the MASK tokens.

Results
We evaluated FELIX on sentence fusion, text simplification, abstractive summarization, and machine translation post-editing. These tasks vary significantly in the types of edits required and dataset sizes under which they operate. Below are the results on the sentence fusion task (i.e., merging two sentences into one), comparing FELIX against a large pre-trained seq2seq model (BERT2BERT) and a text-editing model (LaserTager), under a range of dataset sizes. We see that FELIX outperforms LaserTagger and can be trained on as little as a few hundred training examples. For the full dataset, the autoregressive BERT2BERT outperforms FELIX. However, during inference, this model takes significantly longer.

A comparison of different training dataset sizes on the DiscoFuse dataset. We compare FELIX (using the best performing model) against BERT2BERT and LaserTagger.
Latency in milliseconds for a batch of 32 on a Nvidia Tesla P100.

Conclusion
We have presented FELIX, which is fully non-autoregressive, providing even faster inference times, while achieving state-of-the-art results. FELIX also minimizes the amount of required training data with three techniques — fine-tuning pre-trained checkpoints, learning a small number of edit operations, and an insertion task that mimics masked language model task from the pre-training. Lastly, FELIX strikes a balance between the complexity of learned edit operations and the percentage of input-output transformations it can handle. We have open-sourced the code for FELIX and hope it will provide researchers with a faster, more efficient, and more flexible text-editing model.

Acknowledgements
This research was conducted by Jonathan Mallinson, Aliaksei Severyn (equal contribution), Eric Malmi, Guillermo Garrido. We would like to thank Aleksandr Chuklin, Daniil Mirylenka, Ryan McDonald, and Sebastian Krause for useful discussions, running early experiments and paper suggestions.

Read More

Do Wide and Deep Networks Learn the Same Things?

Posted by Thao Nguyen, AI Resident, Google Research

A common practice to improve a neural network’s performance and tailor it to available computational resources is to adjust the architecture depth and width. Indeed, popular families of neural networks, including EfficientNet, ResNet and Transformers, consist of a set of architectures of flexible depths and widths. However, beyond the effect on accuracy, there is limited understanding of how these fundamental choices of architecture design affect the model, such as the impact on its internal representations.

In “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”, we perform a systematic study of the similarity between wide and deep networks from the same architectural family through the lens of their hidden representations and final outputs. In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization. Comparisons across models demonstrate that those without the block structure show significant similarity between representations in corresponding layers, but those containing the block structure exhibit highly dissimilar representations. These properties of the internal representations in turn translate to systematically different errors at the class and example levels for wide and deep models when they are evaluated on the same test set.

Comparing Representation Similarity with CKA
We extended prior work on analyzing representations by leveraging our previously developed Centered Kernel Alignment (CKA) technique, which provides a robust, scalable way to determine the similarity between the representations learned by any pair of neural network layers. CKA takes as input the representations (i.e., the activation matrices) from two layers, and outputs a similarity score between 0 (not at all similar) and 1 (identical representations).

We apply CKA to a family of ResNets of varying depths and widths, trained on common benchmark datasets (CIFAR-10, CIFAR-100 and ImageNet), and use representation heatmaps to illustrate the results. The x and y axes of each heatmap index the layers of the model(s) in consideration, going from input to output, and each entry (i, j) is the CKA similarity score between layer i and layer j.

We use CKA to compute the representation similarity for all pairs of layers within a single model (i.e., when network 1 and network 2 are identical), and across models (i.e., when network 1 and network 2 are trained with different random initializations, or have different architectures altogether).

Below is an example of the resulting heatmap when we compare representations of each layer to every other layer within a single ResNet of depth 26 and width multiplier 1. In the design convention used here, the stated depth only refers to the number of convolutional layers in the network, but we analyze all layers present, and the width multiplier applies to the number of filters in each convolution. Notice the checkerboard pattern in the heatmap, which is caused by skip connections (shortcuts between layers) in the architecture.

The Emergence of the Block Structure
What stands out from the representation heatmaps of deeper or wider networks is the emergence of a large set of consecutive layers with highly similar representations, which appears in the heatmaps as a yellow square (i.e., a region with high CKA scores). This phenomenon, which we call the block structure, suggests that the underlying layers may not be as efficient at progressively refining the network’s representations as we expect. Indeed, we show that the task performance becomes stagnant inside the block structure, and that it is possible to prune some underlying layers without affecting the final performance.

Block structure — a large, contiguous set of layers with highly similar representations — emerges with increasing width or depth. Each heatmap panel shows the CKA similarity between all pairs of layers within a single neural network. While its size and position can vary across different training runs, the block structure is a robust phenomenon that arises consistently in larger models.

With additional experiments, we show that the block structure has less to do with the absolute model size, than with the size of the model relative to the size of the training dataset. As we reduce the training dataset size, the block structure starts to appear in shallower and narrower networks:

With increasing network width (towards the right along each row) and decreasing dataset size (down each column), the relative model capacity (with respect to a given task) is effectively inflated, and the block structure begins to appear in smaller models.

Through further analysis, we are also able to demonstrate that the block structure arises from preserving and propagating the dominant principal components of its underlying representations. Refer to our paper for more details.

Comparing Representations Across Models
Going further, we study the implications of depth and width on representations across models of different random initializations and different architectures, and find that the presence of block structure makes a significant difference in this context as well. Despite having different architectures, wide and deep models without the block structure do exhibit representation similarity with each other, with corresponding layers broadly being of the same proportional depth in the model. However, when the block structure is present, its representations are unique to each model. This suggests that despite having similar overall performance, each wide or deep model with the block structure picks up a unique mapping from the input to the output.

For smaller models (e.g., ResNet-38 1×), CKA across different initializations (off the diagonal) closely resembles CKA within a single model (on the diagonal). In contrast, representations within the block structure of wider and deeper models (e.g., ResNet-38 10×, ResNet-164 1×) are highly dissimilar across training runs.

Error Analysis of Wide and Deep Models
Having explored the properties of the learned representations of wide and deep models, we next turn to understanding how they influence the diversity of the output predictions. We train populations of networks of different architectures and determine on which test set examples each architecture configuration tends to make errors.

On both CIFAR-10 and ImageNet datasets, wide and deep models that have the same average accuracy still demonstrate statistically significant differences in example-level predictions. The same observation holds for class-level errors on ImageNet, with wide models exhibiting a small advantage in identifying classes corresponding to scenes, and deep networks being relatively more accurate on consumer goods.

Per-class differences on ImageNet between models with increased width (y-axis) or depth (x-axis). Orange dots reflect differences between two sets of 50 different random initializations of ResNet-83 (1×).

Conclusions
In studying the effects of depth and width on internal representations, we uncover a block structure phenomenon, and demonstrate its connection to model capacity. We also show that wide and deep models exhibit systematic output differences at class and example levels. Check out the paper for full details on these results and additional insights! We’re excited about the many interesting open questions these findings suggest, such as how the block structure arises during training, whether the phenomenon occurs in domains beyond image classification, and ways these insights on internal representations can inform model efficiency and generalization.

Acknowledgements
This is a joint work with Maithra Raghu and Simon Kornblith. We would like to thank Tom Small for the visualizations of the representation heatmap.

Read More

Google at ICLR 2021

Posted by Jaqui Herman, Research Specialist and Tim Herrmann, Program Manager

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.

As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).

Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath

Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn

Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li

Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare

Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder

Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer

Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller

Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu

Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi

Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang

Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller

Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu

Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed

Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith

A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare

Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang

Implicit Gradient Regularization
David Barrett, Benoit Dherin

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk

C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister

No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*

Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla

Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz

The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy

Weakly Supervised Learning
Invited Speakers include: Lu Jiang

Learning to Learn
Organizers include: Yevgen Chebotar

Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine

Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh

S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin

Responsible AI (RAI)
Speakers include: Been Kim

Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch

A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu

2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro

Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker

* Indicates work done while at Google

Read More

Flexible, Scalable, Differentiable Simulation of Recommender Systems with RecSim NG

Posted by Martin Mladenov, Research Scientist and Chih-wei Hsu, Software Engineer, Google Research

Recommender systems are the primary interface connecting users to a wide variety of online content, and therefore must overcome a number of challenges across the user population in order to serve them equitably. To this end, in 2019 we released RecSim, a configurable platform for authoring simulation environments to facilitate the study of RL algorithms (the de facto standard ML approach for addressing sequential decision problems) in recommender systems. However, as the technology has progressed, it has become increasingly important to address the gap between simulation and real-world applications, ensuring that models are flexible and easily extendible, enabling probabilistic inference of user dynamics, and addressing computational efficiency.

To address these issues, we recently released RecSim NG, the “Next Generation” of simulators for recommender systems research and development. RecSim NG is a response to a set of use cases that have emerged as important challenges in the application of simulation to real-world problems. It addresses the gap between simulation and real-world applications, ensures the models are flexible and easily extendible, enables probabilistic inference of user dynamics, and addresses computational efficiency.

Overview of RecSim NG
RecSim NG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers a powerful, general probabilistic programming language for agent-behavior specification.

RecSim NG significantly expands the modeling capabilities of RecSim in two ways. First, the story API allows the simulation of scenarios where an arbitrary number of actors (e.g., recommenders, content consumers, content producers, advertisers) interact with one another. This enables the flexible modeling of entire recommender ecosystems, as opposed to the usual isolated user-recommender interaction setting. Second, we introduced a library of behavioral building blocks that, much like Keras layers, implement well-known modeling primitives that can be assembled to build complex models quickly. Following the object-oriented paradigm, RecSim NG uses entity patterns to encapsulate shared parameters that govern various agent behaviors, like user satisfaction, and uses templates to define large populations of agents concisely in a way that abstracts agent “individuality” without duplicating invariant behaviors.

Apart from the typical use of simulators to generate Monte Carlo samples, RecSim NG directly enables various other forms of probabilistic reasoning. While domain knowledge and intuition are key to modeling any recommendation problem, the simulation fidelity needed to bridge the so-called “sim2real” gap can only be achieved by calibrating the simulator’s model to observed data. For data-driven simulation, RecSim NG makes it easy to implement various model-learning algorithms, such as expectation-maximization (EM), generative adversarial training, etc.

Also available within RecSim NG are tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation and tracing. RecSim NG exposes a small set of Edward2 program transformations tailored to simulation-specific tasks. Its log-probability module can evaluate the probabilities of trajectories according to the probabilistic graphical model induced by the simulation. This, together with the automatic differentiation provided by the TensorFlow runtime, enables the implementation of maximum-likelihood estimation and model learning within the simulation itself. RecSim NG can readily use the Markov-chain Monte Carlo (MCMC) machinery provided by TensorFlow Probability to power posterior inference and latent-variable model learning. For example, a simulation model that describes how latent user attributes (e.g., preferences, intents, satisfaction) are translated into observational data (e.g., clicks, ratings, comments) can be “run in reverse,” that is, real observational data generated by a recommender system can be used to identify the most likely configuration of latent user attributes, which in turn can be used to assess the quality of the user experience. This allows for a simulation model to be integrated directly into the full data-science and model-development workflow.

Assessing recommender ecosystem health, i.e., the long-term impact of recommendation strategies on aspects such as overall satisfaction, collective fairness, and safety, requires the simulation of large multi-agent systems in order to plausibly reproduce the interactions between the different participants of the ecosystem. This, along with the computational load of probabilistic inference tasks, requires an efficient simulation runtime. For computational performance, RecSim NG offers a TensorFlow-based runtime for running simulations on accelerated hardware. The simulation takes advantage of all optimizations offered by TensorFlow’s AutoGraph compiler, including accelerated linear algebra (XLA) if available. The simulation will automatically exploit all available cores on the host machine as well as specialized hardware (if run accordingly), such as Tensor Processing Units (TPUs). The core RecSim NG architecture is back-end independent, enabling applications to be developed within other computational frameworks (such as JAX or PyTorch).

Ecosystem Modeling as an Application
To demonstrate the capabilities of RecSim NG, we present a very simplified model of multi-agent interactions among users and content providers in a stylized recommender ecosystem1. The simulation captures the dynamics of a recommender system that mediates the interaction between users and content providers by recommending slates of those providers’ content items to users over time. We adopt a simplified user model whereby each user is characterized by a static, observable “user interest vector.” This vector determines a user’s affinity with a recommended item, which are then used as inputs to a choice model that determines a user’s item selection from a recommended slate. A user’s utility for any selected item is simply their affinity for the item, perturbed with Gaussian noise.

The aim of the recommender is to maximize cumulative user utility, over all users, over a fixed horizon. However, interesting ecosystem effects make this challenging, and emerge because of content provider behavior. Like users, each provider has an “interest vector” around which the content items it makes available are centered, reflecting that provider’s general expertise or tendencies. Providers have their own incentives for making content available: their utility is measured by the number of their items selected by any user over the recent past. Moreover, providers with higher utility generate or make available a greater number of items, increasing the “catalog” from which users (and the recommender) can choose.

We compare two different recommender policies in this setting. The first is a standard “myopic” policy that, for any user, always recommends the items that have the greatest predicted affinity for that user. Under such a policy, the behavior of providers has the potential to give rise to “rich-get-richer” phenomena: providers that initially attract users produce more items at subsequent periods, which increases the odds of attracting even further future engagement. This gradual concentration of available items around “mainstream” content providers has a negative impact on overall user utility over time. The second recommender policy is aware of these provider dynamics, which it counteracts by promoting under-served providers.2 While a simple heuristic, the provider-aware policy increases overall user utility over extended horizons.

The number of agents in the simulation is large and we templatize both users and content providers with reusable modeling blocks offered by RecSim NG. Determining how to execute the simulation in parallel is non-trivial, so it is critical to utilize TF’s AutoGraph and other computational optimizations.

Conclusion
Our hope is that RecSim NG will make it easier for both researchers and practitioners to develop, train and evaluate novel algorithms for recommender systems, especially algorithms intended to optimize system behavior over extended horizons, capture complex multi-agent interactions and incentives, or both. We are also investigating the release of increasingly realistic user models that can serve as benchmarks for the research community, as well as methods that can facilitate “sim2real” transfer using RecSim NG.

Further details regarding the RecSim NG framework can be found in the associated white paper, while code and colabs/tutorials are available here. A video about RecSim NG presented at RecSys-2020 is shown below:

Acknowledgements
We thank our collaborators and early adopters of RᴇᴄSɪᴍ NG, including the other members of the RecSim NG team: Vihan Jain, Eugene Ie, Chris Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov and Craig Boutilier.


1 This model is a much simpler version of that presented in this ICML-20 paper

2 This simple heuristic policy is used only to demonstrate RecSim NG’s capabilities. More sophisticated algorithms that compute policies that explicitly maximize long-term user utility are discussed in this ICML-20 paper

Read More

When artists and machine intelligence come together

Throughout history, from photography to video to hypertext, artists have pushed the expressive limits of new technologies, and artificial intelligence is no exception. At I/O 2019, Google Research and Google Arts & Culture launched the Artists + Machine Intelligence Grants, providing a range of support and technical mentorship to six artists from around the globe following an open call for proposals. The inaugural grant program sought to expand the field of artists working with Machine Learning (ML) and, through supporting pioneering artists, creatively push at the boundaries of generative ML and natural language processing. 

Today, we are publishing the outcomes of the grants. The projects draw from many disciplines, including rap and hip hop, screenwriting, early cinema, phonetics, Spanish language poetry, and Indian pre-modern sound. What they all have in common is an ability to challenge our assumptions about AI’s creative potential.

a graffiti-style visualization of the artwork

Learn more about the Hip Hop Poetry Bot

Hip Hop Poetry Bot by Alex Fefegha  

Can AI rap? Alex explores speech generation trained on rap and hip hop lyrics by Black artists. For the moment it exists as a proof of concept, as building the experiment in full requires a large, public dataset of rap and hip hop lyrics on which an algorithm can be trained, and such a public archive doesn’t currently exist.  The project is therefore launching with an invitation from Alex to rap and hip hop artists to become creative collaborators and contribute their lyrics to create a new, public dataset of lyrics by Black artists. 

A woman, partly smiling, in an industrial-style room

Read more about Neural Swamp

Neural Swamp by Martine Syms 

Martine uses video and performance to examine representations of blackness across generations, geographies, mediums, and traditions. For this residency, Martine developed Neural Swamp, a play staged across five screens, starring five entities who talk and sing alongside and over each other. Two of the five voices are trained on Martine’s voice and generated using machine learning speech models. The project will premiere at The Philadelphia Museum of Art and Fondazione Sandretto Re Rebaudengo in Fall 2021.

A dashboard with toggles for changing the letters in a sentence

The Nonsense Laboratory by Allison Parrish  

Allison invites you to adjust, poke at, mangle, curate and compress words with a series of playful tools in her Nonsense Laboratory. Powered by a bespoke code library and machine learning model developed by Allison Parrish you can mix and respell words, sequence mouth movements to create new words, rewrite a text so that the words feel different in your mouth, or go on a journey through a field of nonsense. 

A collage of images, in the style of old cinema film

Let Me Dream Again by Anna Ridler 

Anna uses machine learning to try to recreate lost films from fragments of early Hollywood and European cinema that still exist. The outcome? An endlessly evolving, algorithmically generated film and soundtrack. The film will continually play, never repeating itself, over a period of one month. 

A woman in a desert holding a staff

Read more about Knots of Code

Knots of Code by Paola Torres Núñez del Prado

Paola studies the history of quipus, a pre-Columbian notation system that is based on the tying of knots in ropes, as part of a new research project, Knots of Code. The project’s first work is a Spanish language poetry-album from Paola and AIELSON, an artificial intelligence system that composes and recites poetry inspired by quipus and emulating the voice of the late Peruvian poet J.E. Eielson. 

An empty stage with bells hanging on wires

Read more about Dhvāni

Dhvāni by Budhaditya Chattopadhyay 

Budhaditya brings a lifelong interest in the materiality, phenomenology, political-cultural associations, and the sociability of sound to Dhvāni, a responsive sound installation, comprising 51 temple bells and conducted with the help of machine learning. An early iteration of Dhvāni was installed at EXPERIMENTA Arts & Sciences Biennale 2020 in Grenoble, France.  

Explore the artworks at g.co/artistsmeetai or on the free Google Arts & Culture app for iOS and Android.

Read More

Model-Based RL for Decentralized Multi-agent Navigation

Posted by Rose E. Wang, Student Researcher and Aleksandra Faust, Staff Research Scientist, Google Research

As robots become more ubiquitous in day-to-day life, the complexity of their interactions with each other and with the environment grows. In a controlled environment, such as a lab, multiple robots can coordinate their actions and efforts through a centralized planner that facilitates communication between individual agents. And while much research has been done to address reliable sensor-informed goal navigation, in many real-world applications aligning goals across independent robotic agents must be done without a centralized planner, which poses non-trivial challenges.

An example of such a challenging decentralized task is the rendezvous task, in which multiple agents must agree upon a time and place at which they can meet, without explicitly communicating with one another. This goal alignment task plays an important role in real world multiagent and human-robot settings, e.g., performing object handovers or determining goals on the fly. Solving the decentralized rendezvous task in this situation depends not just on the obstacles in the environment, but also the policies and dynamics of each agent. Addressing potential miscoordination and dealing with noisy sensor data depends on the agents’ ability to model the motions of other agents as well as their own, and to adapt to diverging intentions while using limited information.

An example of two independently controlled robots separated by obstacles that share the objective of meeting each other. How should they move in order to meet? Example trajectories are illustrated in red and blue arrows for each robot. Each robot makes an independent decision of where to go based on their own observations.

In “Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous”, presented at CoRL 2020, we propose an holistic approach to address the challenges of the decentralized rendezvous task, which we call hierarchical predictive planning (HPP). This is a decentralized, model-based reinforcement learning (RL) system that enables agents to align their goals on the fly in the real world. We evaluate HPP in a mixture of real-world and simulated environments and compare it to several learning-based planning and centralized baselines. In those evaluations, we show that HPP is able to more effectively predict and align trajectories, avoid miscoordinations, and directly transfer to the real world without additional fine-tuning.

Putting Together Prediction, Planning and Control
Akin to a standard navigation pipeline, our learning-based system consists of three modules: prediction, planning, and control. Each agent employs the prediction model to learn agent motion and to predict the future positions of itself (the ego-agent) and others based on its own observations (e.g., from LiDAR and team position information) of other agents’ behaviors and navigation patterns. So, each agent learns two prediction models, one for its own motion and one for the other agent. These motion predictors constitute the prediction module, and are used by each agent’s planning module.

The output of the prediction module — the estimate of where each agent, both the ego-agent and the other agents, is most likely to be given the ego-agent’s own sensor observations — is useful information for the planning module, which evaluates different goal locations and maintains a belief distribution over where the team should converge. The belief distribution is periodically updated using evaluations provided by the prediction model. An agent samples from this belief distribution to update the goal to which it should navigate.

The selected goal is passed to the agent’s control module, which is equipped with a pre-trained, imperfect navigation policy that can navigate to a given location in the obstacle-laden environment. The control policy then determines what action the robot should execute.

This process of observing other agents, updating belief distributions and navigating to an updated goal repeats until agents have successfully rendezvoused. While the hierarchical planning and control setup are not unusual, our work closes the loop between the control and planning for decentralized multiagent systems by use of the sensor-informed prediction module.

Training the Prediction Models
HPP trains motion predictors in simulation, assuming that each agent is controlled by a hidden, perhaps suboptimal, control policy capable of avoiding obstacles. The key difficulty lies in training prediction models without access to other agents’ sensor observations and control policies.

The predictors are trained via self-supervision. To collect the training data, we randomly place all the agents and obstacles in an environment, and each agent is given a random goal (unknown to other agents). As the agents move toward their respective goals, each agent records the experience — its sensor observations and the poses of all agents (itself and other agents). Next, from the recorded experience, the agent learns a separate predictor for each agent in the team including itself (target agent). The training dataset consists of ego-agent initial sensor observations, target agent’s pose and goal, labeled with future ego-observations and target agent poses. The goal and labels are inferred from the recorded experience.

As a result, the predictors learn temporal causality of the present and future ego-agent’s observations and target agent’s poses, conditioned on the target agent’s assumed goals — in other words the models predict where each agent will be in the future based on the present. The predictor training is done only with the information available to agents at the runtime, and in environments independent from the deployment environments.

The training environment for the model prediction models. The environment is filled with randomly filled obstacles. All agents (left in blue, upper right in red) are given the same random goal (center in green) and move with their own control modules towards it.

Selecting Goals for Alignment
A model-based RL planner for each agent uses the learned predictors in the deployment environments to guide the agents towards the rendezvous point. The planner takes into account what it believes the other agents would do when also completing the rendezvous task.

HPP illustration. Each robot independently considers several potential rendezvous points, and evaluates each point based how close it believes that the agents can get.

To perform this reasoning, each agent independently samples a series of potential goals and selects the goal that it believes it would be the most likely to succeed. This process effectively simulates a centralized planner for fictitious agents by using the prediction models to predict trajectories of those agents moving to a fixed goal. Conditioned on a proposed goal, the algorithm predicts the poses of the agents in the future, which are generated from sequential roll outs of the prediction models. Each goal is then evaluated by scoring the anticipated system state using the task reward favoring goals that bring agents closer together. We use the cross-entropy method (CEM) to convert these goal evaluations into belief updates over potential rendezvous points. Finally, the agent’s planner selects a goal for itself from this new belief distribution and passes this goal to the agent’s control module.

A simple illustration of the goal evaluation. At the end of a simulated trajectory, the agents (red, left, and blue, right) are either far (top) or close (bottom) to each other. The goal in the bottom image is better than the goal on top because agents end up closer to each other.

Results
We compare HPP against several baselines — MADDPG (learning-based), RRT (planning) with CEM, and centralized baselines that use heuristics for selecting the agent’s rendezvous point — in a mixture of real-world and simulated environments.

Evaluation environments, each of which are independent of the training environment for the agent’s control policy and prediction modules.

There are two main takeaways from our results. One is that HPP enables agents to predict and align trajectories, avoiding miscoordinations. For example:

The second takeaway is that HPP transfers directly into the real world without additional training. For example:

Conclusion
This work presents HPP, a model-based RL approach for decentralized multiagent coordination. Agents first learn to predict where they and their teammates are going to be from their own sensors and decide and navigate to a common goal. Our experiments demonstrate the method generalizes to new environments and handles miscoordination while making no assumptions about the dynamics of other agents. This may be of interest to the larger multiagent research community as a real-world example of a decentralized task using noisy sensors and imperfect controllers, to the motion planning community as an example of a learning-based planning system that closes the loop between the planner and controller, and to the RL community as an example of model-based RL as feedback in a hierarchical, self-supervised prediction setting.

Acknowledgements
This research was done by Rose E. Wang, J. Chase Kew, Dennis Lee, Tsang-Wei Edward Lee, Tingnan Zhang, Brian Ichter, Jie Tan, Aleksandra Faust with special thanks to Michael Everett, Oscar Ramirez and Igor Mordatch for the insightful discussions.

Read More

Holistic Video Scene Understanding with ViP-DeepLab

Posted by Siyuan Qiao, Student Researcher and Liang-Chieh Chen, Research Scientist, Google Research

People are able to retrieve the visual information about 3D environments from a picture quite easily — we can identify objects, determine instance sizes, and reconstruct 3D scene layout, all using the limited signals contained in 2D images. This ability is commonly known as the inverse projection problem, which refers to reconstructing the ambiguous mapping from the retinal images to the sources of retinal stimulation. Real-world computer vision applications, such as autonomous driving, heavily rely on these capabilities to localize and identify 3D objects, which require vision models to infer the spatial location, semantic class, and instance label for each 3D point projected to the 2D images. The ability to reconstruct the 3D world from images can be decomposed into two disjoint computer vision tasks: monocular depth estimation (predicting depth from a single image) and video panoptic segmentation (the unification of instance segmentation and semantic segmentation, in the video domain). However, research has generally considered each task separately. Tackling these tasks jointly with a unified computer vision model could result in easier deployment and greater efficiency by sharing computation among multiple tasks.

Driven by the potential value of a model that predicts depth and video panoptic segmentation at the same time, we present “ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation”, accepted to CVPR 2021. In this work, we propose a new task, depth-aware video panoptic segmentation, that aims to simultaneously tackle monocular depth estimation and video panoptic segmentation. For the new task, we present two derived datasets accompanied by a new evaluation metric called depth-aware video panoptic quality (DVPQ). This new metric includes the metrics for depth estimation and video panoptic segmentation, requiring a vision model to simultaneously tackle the two sub-tasks. To this end, we extend Panoptic-DeepLab by adding network branches for depth and video predictions to create ViP-DeepLab, a unified model that jointly performs video panoptic segmentation and monocular depth estimation for each pixel on the image plane, and achieves state-of-the-art performance on several academic datasets for the sub-tasks. This video demonstrates the new task and shows the results of ViP-DeepLab.

Depth-aware video panoptic segmentation results obtained by ViP-DeepLab. Top-left: Video frames used as input. Top-right: Video panoptic segmentation results. Bottom-left: Estimated depth. Bottom-right: Reconstructed 3D points. Each object instance has a unique and temporally consistent label, e.g., pedestrain_1, pedestrain_2, etc. Input images are from the Cityscapes dataset.

Overview
While Panoptic-DeepLab is able to output semantic segmentation, center prediction, and center regression for a single frame, it lacks the capability of depth estimation and temporally consistent instance ID prediction for multiple frames. However, ViP-DeepLab accomplishes this by performing additional predictions from two consecutive frames as input. The first additional output is depth estimation for the first frame, for which it assigns an estimated depth to each pixel. In addition, ViP-DeepLab also performs center regression for two consecutive frames for only the object centers that appear in the first frame. This process is called center offset prediction, and allows ViP-DeepLab to group all the pixels in the two frames to the same object that appears in the first frame. New instances emerge if they are not grouped to the previously detected instances. This process continues for every two consecutive frames (with one overlapping frame) in a video sequence, stitching panoptic predictions together to form predictions with temporally consistent instance IDs. That is, it stitches together where objects are and how they move in a video scene with time.

Outputs of ViP-DeepLab for video panoptic segmentation. Two consecutive frames are concatenated as input. The semantic segmentation output associates each pixel with its semantic classes, while the instance segmentation outputs identify the pixels from two frames associated with an individual object in the first frame. Input images are from the Cityscapes dataset.
Visualization of stitching video panoptic predictions. ViP-DeepLab propagates IDs based on mask intersection-over-union between region pairs. It is capable of tracking objects with large movements, e.g., the cyclist in the image.

Neural Network Design
Building on top of Panoptic-DeepLab, ViP-DeepLab additionally contains two prediction branches: (1) a depth prediction branch, and (2) a next-frame instance branch. Specifically, the depth prediction head is a simple design that predicts depth regression for every pixel, while the next-frame instance branch predicts the center offsets for the pixels in the second frame with respect to the centers in the first frame.

Results
We have tested ViP-DeepLab on multiple popular benchmarks, including Cityscapes-VPS, KITTI Depth Prediction, and KITTI Multi-Object Tracking and Segmentation (MOTS).

Specifically, ViP-DeepLab achieves state-of-the-art (SOTA) results, significantly outperforming previous methods by 5.1% video panoptic quality (VPQ) on the Cityscapes-VPS test set.

Method VPQAll VPQThings VPQStuff
VPSNet 57.4% 45.8% 64.8%
ViP-DeepLab          62.5% (+5.1%)       50.2% (+4.4%)       70.3% (+5.5%)   
VPQ comparison on Cityscapes-VPS test set.

ViP-DeepLab ranks 1st on the KITTI depth prediction benchmark, improving over previous methods by 0.65 SILog (the smaller the better).

Method    SILog       sqErrorRel       absErrorRel       iRMSE   
PWA 11.45 2.30 9.05 12.32
ViP-DeepLab       10.80 2.19 8.94 11.77
Monocular depth estimation comparison on KITTI Depth Prediction benchmark. Note for the depth estimation metrics, the smaller the values, the better the performance. While differences may appear small, the top-performing method on this benchmark usually has a gap in SILog smaller than 0.1.

Additionally, ViP-DeepLab was also 1st on KITTI MOTS pedestrians and 3rd on KITTI MOTS cars ranked by the metric sMOTSA, and now is 3rd for both pedestrians and cars ranked by a newer metric HOTA.

Class Method HOTA
Car PointTrack 62.0%
ViP-DeepLab 76.4% (+14.4%)
Pedestrian       PointTrack 54.4%
ViP-DeepLab          64.3% (+9.9%)   
Performance comparison on KITTI Multi-Object Tracking and Segmentation.

Finally, we also present two new datasets for the new task, depth-aware video panoptic segmentation, and test ViP-DeepLab on them. We hope our ViP-DeepLab results on these two new datasets will serve as a strong baseline for the community to compare against. The results are shown below.

Dataset    DVPQAll       DVPQThings       DVPQStuff   
Cityscapes-DVPS       55.1% 43.3% 63.6%
SemKITTI-DVPS 45.6% 36.6% 52.2%
ViP-DeepLab performance for the task of depth-aware video panoptic segmentation on two new datasets.

Conclusion
With a simple architecture, ViP-DeepLab achieves state-of-the-art performance on video panoptic segmentation, monocular depth estimation, and multi-object tracking and segmentation. We hope that along with MaX-DeepLab, which proposes an efficient dual-path transformer module that allows for end-to-end image panoptic segmentation, ViP-DeepLab is useful to the community and furthers research into a more holistic understanding of scenes in the real world.

Acknowledgements
We would like to thank the support and valuable discussions with Yukun Zhu, Hartwig Adam, and Alan Yuille (co-authors of ViP-DeepLab), as well as Maxwell Collins, and the Mobile Vision team.

Read More

HDR+ with Bracketing on Pixel Phones

Posted by Manfred Ernst and Bartlomiej Wronski, Software Engineers, Google Research

We’re continuously working to improve the Pixel — making it more helpful, more capable, and more fun — with regular updates, such as the recent V8.2 update to the Camera app. One such improvement (launched on Pixel 5 and Pixel 4a 5G in October) is a feature that operates “under the hood”, HDR+ with Bracketing. This feature works by merging images taken with different exposure times to improve image quality (especially in shadows), resulting in more natural colors, improved details and texture, and reduced noise.

Why Are HDR Scenes Hard to Capture?
The original HDR+ burst photography system is the engine behind high-quality mobile photography, which captures a rapid series of deliberately underexposed images, then combines and renders them in a way that preserves detail across the range of tones. But this system had one limitation: scenes with high dynamic range (HDR) like the one below were noisy in the shadows because all images captured are underexposed.

The same photo using HDR+ (red outline) and HDR+ with Bracketing (green outline). While the characteristic HDR+ look remains the same, bracketing improves image quality, especially in shadows, with more natural colors, improved details and texture, and reduced noise.

Capturing HDR scenes is difficult because of the physical constraints of image sensors combined with limited signal in the shadows. We can correctly expose either the shadows or the highlights, but not both at the same time.

The same scene shot with different exposure settings and tonemapped to similar overall brightness. Left/Top: Exposure set for the highlights. The bright blue sky is preserved, but the shadows are very noisy. Right/Bottom: Exposure set for the shadows. Noise in the shadows is reduced, but the sky is clipped (white).

Photographers sometimes work around these limitations by taking two different exposures and combining them. This approach, known as exposure bracketing, can deliver the best of both worlds, but it is time-consuming to do by hand. It is also challenging in computational photography because it requires:

  1. Capturing additional long exposure frames while maintaining the fast, predictable capture experience of the Pixel camera.
  2. Taking advantage of long exposure frames while avoiding ghosting artifacts caused by motion between frames.

To avoid these challenges, the original HDR+ system used a different approach to handle high dynamic range scenes.

The Limits of HDR+
The capture strategy used by HDR+ is based on underexposure, which avoids loss of detail in the highlights. While this strategy comes at the expense of noise in the shadows, HDR+ offsets the increased noise through the use of burst photography.

Using bursts to improve image quality. HDR+ starts from a burst of full-resolution raw images (left). Depending on conditions, between 2 and 15 images are aligned and merged into a computational raw image (middle). The merged image has reduced noise and increased dynamic range, leading to a higher quality final result (right).

This approach works well for scenes with moderate dynamic range, but breaks down for HDR scenes. To understand why, we need to take a closer look at how two types of noise get into an image.

Noise in Burst Photography
One important type of noise is called shot noise, which depends only on the total amount of light captured — the sum of N frames, each with E seconds of exposure time has the same amount of shot noise as a single frame exposed for N × E seconds. If this were the only type of noise present in captured images, burst photography would be as efficient as taking longer exposures. Unfortunately, a second type of noise, read noise, is introduced by the sensor every time a frame is captured. Read noise doesn’t depend on the amount of light captured but instead depends on the number of frames taken — that is, with each frame taken, an additional fixed amount of read noise is added.

This is why using burst photography to reduce total noise isn’t as efficient as simply taking longer exposures: taking multiple frames can reduce the effect of shot noise, but will also increase read noise. Even though read noise increases with the number of frames, it is still possible to reduce the overall noisiness with burst photography, but it becomes less efficient. If one were to break a long exposure into N shorter exposures, the ratio of signal to noise in the final image would be lower because of the additional read noise. In this case, to get back to the signal-to-noise ratio in the single long exposure, one would need to merge N2 short-exposure frames. In the example below, if a long exposure were divided into 12 short exposures, we’d have to capture 144 (12 × 12) short frames to match the signal-to-noise ratio in the shadows! Capturing and processing this many frames would be much more time consuming — burst capture and processing could take over a minute and result in a poor user experience. Instead, with bracketing one can capture both short and long exposures — combining highlight protection and noise reduction.

Left: The result of merging 12 short-exposure frames in Night Sight mode. Right: A single frame whose exposure time is 12 times longer than an individual short exposure. The longer exposure has significantly less noise in the shadows but sacrifices the highlights.

Solving with Bracketing
While the challenges of bracketing prevented the original HDR+ system from using it, incremental improvements since then, plus a recent concentrated effort, have made it possible in the Camera app. To start, adding bracketing to HDR+ required redesigning the capture strategy. Capturing is complicated by zero shutter lag (ZSL), which underpins the fast capture experience on Pixel. With ZSL, the frames displayed in the viewfinder before the shutter press are the frames we use for HDR+ burst merging. For bracketing, we capture an additional long exposure frame after the shutter press, which is not shown in the viewfinder. Note that holding the camera still for half a second after the shutter press to accommodate the long exposure can help improve image quality, even with a typical amount of handshake.

Capture strategy. Top: The original HDR+ method captures short exposures before the shutter press, six in this example. Bottom: HDR+ with Bracketing captures five short exposures before the shutter press and one long exposure after the shutter press.

For Night Sight, the capture strategy isn’t constrained by the viewfinder — because all frames are captured after the shutter press while the viewfinder is stopped, this mode easily accommodates capturing longer exposure frames. In this case, we capture three long exposures to further reduce noise.

Capture strategy for Night Sight. Top: The original Night Sight captured 15 short exposure frames. Bottom: Night Sight with bracketing captures 12 short and 3 long exposures.

The Merging Algorithm
When merging bracketed shots, we choose one of the short frames as the reference frame to avoid potentially clipped highlights and motion blur. All other frames are aligned to this frame before they are merged. This introduces a challenge — for complex scene motion or occluded regions, it is impossible to find exactly matching regions and a naïve merge algorithm would produce ghosting artifacts in these cases.

Left: Ghosting artifacts are visible around the silhouette of a moving person, when deghosting is disabled.
Right: Robust merging produces a clean image.

To address this, we designed a new spatial merge algorithm, similar to the one used for Super Res Zoom, that decides per pixel whether image content should be merged or not. This deghosting is more complicated for frames with different exposures. Long exposure frames have different noise characteristics, clipped highlights, and different amounts of motion blur, which makes comparisons with the short exposure reference frame more difficult. In addition, ghosting artifacts are more visible in bracketed shots, because noise that would otherwise mask these errors is reduced. Despite those challenges, our algorithm is as robust to these issues as the original HDR+ and Super Res Zoom and doesn’t produce ghosting artifacts. At the same time, it merges images 40% faster than its predecessors. Because it merges RAW images early in the photographic pipeline, we were able to achieve all of those benefits while keeping the rest of processing and the signature HDR+ look unchanged. Furthermore, users who prefer to use computational RAW images can take advantage of those image quality and performance improvements.

Bracketing on Pixel
HDR+ with Bracketing is available to users of Pixel 4a (5G) and 5 in the default camera, as well as in Night Sight and Portrait modes. For users of Pixel 4 and 4a, the Google Camera app supports bracketing in Night Sight mode. No user interaction is needed to activate HDR+ with Bracketing — depending on the dynamic range of the scene, and the presence of motion, HDR+ with bracketing chooses the best exposures to maximize image quality (examples).

Acknowledgements
HDR+ with Bracketing is the result of a collaboration across several teams at Google. The project would not have been possible without the joint efforts of Sam Hasinoff, Dillon Sharlet, Kiran Murthy, Mike Milne, Andy Radin, Nicholas Wilson, Navin Sarma‎, Gabriel Nava, Emily To, Sushil Nath, Alexander Schiffhauer, Isaac Reynolds, Bill Strathearn, Marius Renn, Alex Hong, Jose Ricardo Lima, Bob Hung, Ying Chen Lou, Joy Hsu, Blade Chiu, David Massoud, Jean Hsu, Ellie Yang, and Marc Levoy.

Read More

Evolving Reinforcement Learning Algorithms

Posted by John D. Co-Reyes, Research Intern and Yingjie Miao, Senior Software Engineer, Google Research

A long-term, overarching goal of research into reinforcement learning (RL) is to design a single general purpose learning algorithm that can solve a wide array of problems. However, because the RL algorithm taxonomy is quite large, and designing new RL algorithms requires extensive tuning and validation, this goal is a daunting one. A possible solution would be to devise a meta-learning method that could design new RL algorithms that generalize to a wide variety of tasks automatically.

In recent years, AutoML has shown great success in automating the design of machine learning components, such as neural networks architectures and model update rules. One example is Neural Architecture Search (NAS), which has been used to develop better neural network architectures for image classification and efficient architectures for running on phones and hardware accelerators. In addition to NAS, AutoML-Zero shows that it’s even possible to learn the entire algorithm from scratch using basic mathematical operations. One common theme in these approaches is that the neural network architecture or the entire algorithm is represented by a graph, and a separate algorithm is used to optimize the graph for certain objectives.

These earlier approaches were designed for supervised learning, in which the overall algorithm is more straightforward. But in RL, there are more components of the algorithm that could be potential targets for design automation (e.g., neural network architectures for agent networks, strategies for sampling from the replay buffer, overall formulation of the loss function), and it is not always clear what the best model update procedure would be to integrate these components. Prior efforts for the automation RL algorithm discovery have focused primarily on model update rules. These approaches learn the optimizer or RL update procedure itself and commonly represent the update rule with a neural network such as an RNN or CNN, which can be efficiently optimized with gradient-based methods. However, these learned rules are not interpretable or generalizable, because the learned weights are opaque and domain specific.

In our paper “Evolving Reinforcement Learning Algorithms”, accepted at ICLR 2021, we show that it’s possible to learn new, analytically interpretable and generalizable RL algorithms by using a graph representation and applying optimization techniques from the AutoML community. In particular, we represent the loss function, which is used to optimize an agent’s parameters over its experience, as a computational graph, and use Regularized Evolution to evolve a population of the computational graphs over a set of simple training environments. This results in increasingly better RL algorithms, and the discovered algorithms generalize to more complex environments, even those with visual observations like Atari games.

RL Algorithm as a Computational Graph
Inspired by ideas from NAS, which searches over the space of graphs representing neural network architectures, we meta-learn RL algorithms by representing the loss function of an RL algorithm as a computational graph. In this case, we use a directed acyclic graph for the loss function, with nodes representing inputs, operators, parameters and outputs. For example, in the computational graph for DQN, input nodes include data from the replay buffer, operator nodes include neural network operators and basic math operators, and the output node represents the loss, which will be minimized with gradient descent.

There are a few benefits of such a representation. This representation is expressive enough to define existing algorithms but also new, undiscovered algorithms. It is also interpretable. This graph representation can be analyzed in the same way as human designed RL algorithms, making it more interpretable than approaches that use black box function approximators for the entire RL update procedure. If researchers can understand why a learned algorithm is better, then they can both modify the internal components of the algorithm to improve it and transfer the beneficial components to other problems. Finally, the representation supports general algorithms that can solve a wide variety of problems.

Example computation graph for DQN which computes the squared Bellman error.

We implemented this representation using the PyGlove library, which conveniently turns the graph into a search space that can be optimized with regularized evolution.

Evolving RL Algorithms
We use an evolutionary based approach to optimize the RL algorithms of interest. First, we initialize a population of training agents with randomized graphs. This population of agents is trained in parallel over a set of training environments. The agents first train on a hurdle environment — an easy environment, such as CartPole, intended to quickly weed out poorly performing programs.

If an agent cannot solve the hurdle environment, the training is stopped early with a score of zero. Otherwise the training proceeds to more difficult environments (e.g., Lunar Lander, simple MiniGrid environments, etc.). The algorithm performance is evaluated and used to update the population, where more promising algorithms are further mutated. To reduce the search space, we use a functional equivalence checker which will skip over newly proposed algorithms if they are functionally the same as previously examined algorithms. This loop continues as new mutated candidate algorithms are trained and evaluated. At the end of training, we select the best algorithm and evaluate its performance over a set of unseen test environments.

The population size in the experiments was around 300 agents, and we observed the evolution of good candidate loss functions after 20-50 thousand mutations, requiring about three days of training. We were able to train on CPUs because the training environments were simple, controlling for the computational and energy cost of training. To further control the cost of training, we seeded the initial population with human-designed RL algorithms such as DQN.

Overview of meta-learning method. Newly proposed algorithms must first perform well on a hurdle environment before being trained on a set of harder environments. Algorithm performance is used to update a population where better performing algorithms are further mutated into new algorithms. At the end of training, the best performing algorithm is evaluated on test environments.

Learned Algorithms
We highlight two discovered algorithms that exhibit good generalization performance. The first is DQNReg, which builds on DQN by adding a weighted penalty on the Q-values to the normal squared Bellman error. The second learned loss function, DQNClipped, is more complex, although its dominating term has a simple form — the max of the Q-value and the squared Bellman error (modulo a constant). Both algorithms can be viewed as a way to regularize the Q-values. While DQNReg adds a soft constraint, DQNClipped can be interpreted as a kind of constrained optimization that will minimize the Q-values if they become too large. We show that this learned constraint kicks in during the early stage of training when overestimating the Q-values is a potential issue. Once this constraint is satisfied, the loss will instead minimize the original squared Bellman error.

A closer analysis shows that while baselines like DQN commonly overestimate Q-values, our learned algorithms address this issue in different ways. DQNReg underestimates the Q-values, while DQNClipped has similar behavior to double dqn in that it slowly approaches the ground truth without overestimating it.

It’s worth pointing out that these two algorithms consistently emerge when the evolution is seeded with DQN. Learning from scratch, the method rediscovers the TD algorithm. For completeness, we release a dataset of top 1000 performing algorithms discovered during evolution. Curious readers could further investigate the properties of these learned loss functions.

Overestimated values are generally a problem in value-based RL. Our method learns algorithms that have found a way to regularize the Q-values and thus reduce overestimation.

Learned Algorithms Generalization Performance
Normally in RL, generalization refers to a trained policy generalizing across tasks. However, in this work we’re interested in algorithmic generalization performance, which means how well an algorithm works over a set of environments. On a set of classical control environments, the learned algorithms can match baselines on the dense reward tasks (CartPole, Acrobot, LunarLander) and outperform DQN on the sparser reward task, MountainCar.

Performance of learned algorithms versus baselines on classical control environments.

On a set of sparse reward MiniGrid environments, which test a variety of different tasks, we see that DQNReg greatly outperforms baselines on both the training and test environments, in terms of sample efficiency and final performance. In fact, the effect is even more pronounced on the test environments, which vary in size, configuration, and existence of new obstacles, such as lava.

Training environment performance versus training steps as measured by episode return over 10 training seeds. DQNReg can match or outperform baselines in sample efficiency and final performance.
DQNReg can greatly outperform baselines on unseen test environments.

We visualize the performance of normal DDQN vs. the learned algorithm DQNReg on a few MiniGrid environments. The starting location, wall configuration, and object configuration of these environments are randomized at each reset, which requires the agent to generalize instead of simply memorizing the environment. While DDQN often struggles to learn any meaningful behavior, DQNReg can learn the optimal behavior efficiently.

DDQN
DQNReg (Learned) 

Even on image-based Atari environments we observe improved performance, even though training was on non-image-based environments. This suggests that meta-training on a set of cheap but diverse training environments with a generalizable algorithm representation could enable radical algorithmic generalization.

Env DQN DDQN PPO DQNReg
Asteroid 1364.5 734.7 2097.5 2390.4
Bowling 50.4 68.1 40.1 80.5
Boxing 88.0 91.6 94.6 100.0
RoadRunner   39544.0     44127.0     35466.0     65516.0  
Performance of learned algorithm, DQNReg, against baselines on several Atari games. Performance is evaluated over 200 test episodes every 1 million steps.

Conclusion
In this post, we’ve discussed learning new interpretable RL algorithms by representing their loss functions as computational graphs and evolving a population of agents over this representation. The computational graph formulation allows researchers to both build upon human-designed algorithms and study the learned algorithms using the same mathematical toolset as the existing algorithms. We analyzed a few of the learned algorithms and can interpret them as a form of entropy regularization to prevent value overestimation. These learned algorithms can outperform baselines and generalize to unseen environments. The top performing algorithms are available for further analytical study.

We hope that future work will extend to more varied RL settings such as actor critic algorithms or offline RL. Furthermore we hope that this work can lead to machine assisted algorithm development where computational meta-learning can help researchers find new directions to pursue and incorporate learned algorithms into their own work.

Acknowledgements
We thank our co-authors Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, and Aleksandra Faust. We also thank Luke Metz for helpful early discussions and feedback on the paper, Hanjun Dai for early discussions on related research ideas, Xingyou Song, Krzysztof Choromanski, and Kevin Wu for helping with infrastructure, and Jongwook Choi for helping with environment selection. Finally we thank Tom Small for designing animations for this post.

Read More

A whale of a tale about responsibility and AI

A couple of years ago, Google AI for Social Good’s Bioacoustics team created a ML model that helps the scientific community detect the presence of humpback whale sounds using an acoustic recording. This tool, developed in partnership with the National Oceanic and Atmospheric Association, helps biologists study whale behaviors, patterns, population and potential human interactions. 

We realized other researchers could use this model for their work, too — it could help them better understand the oceans and protect key biodiversity areas. We wanted to freely share this model, but  struggled with a big dilemma: On one hand, it could help ocean scientists. On the other, though, we worried about whale poachers or other bad actors. What if they used our shared knowledge in a way we didn’t intend? 

We decided to consult with experts in the field in order to help us responsibly open source this machine learning model. We worked with Google’s Responsible Innovation team to use our AI Principles — aguide to responsibly developing technology — to make a decision.

The team gave us the guidance we needed to open source a machine learning model that could be socially beneficial and was built and tested for safety, while also upholding high standards of scientific excellence for the marine biologists and researchers worldwide. 

On Earth Day — and every day — putting the AI Principles into practice is important to the communities we serve, on land and in the sea. 

Curious about diving deeper? You can use AI to explore thousands of hours of humpback whale songs and make your own discoveries with our Pattern Radio and see our collaboration with the National Oceanic and Atmospheric Association of the United States as well as our work with Fisheries and Oceans Canada (DFO) to apply machine learning to protect killer whales in the Salish Sea.

Read More