Meet the researcher creating more access with language

When you’ve got your hands full, so you use your voice to ask your phone to play your favorite song, it can feel like magic. In reality, it’s a more complicated combination of engineering, design and natural language processing at work, making it easier for many of us to use our smartphones. But what happens when this voice technology isn’t available in our own language? 

This is something Google India researcher Shachi Dave considers as part of her day-to-day work. While English is the most widely spoken language globally, it ranks third as the most widely spoken native language (behind Mandarin and Spanish)—just ahead of Hindi, Bengali and a number of other languages that are official in India. Home to more than one billion people and an impressive number of official languages—22, to be exact—India is at the cutting edge of Google’s language localization or L10n (10 represents the number of letters between ‘l’ and ‘n’) efforts. 

Shachi, who is a founding member of the Google India Research team, works on natural language understanding, a field of artificial intelligence (AI) which builds computer algorithms to understand our everyday speech and language. Working with Google’s AI principles, she aims to ensure teams build our products to be socially beneficial and inclusive. Born and raised in India, Shachi graduated with a master’s degree in computer science from the University of Southern California. After working at a few U.S. startups, she joined Google over 12 years ago and returned to India to take on more research and leadership responsibilities. Since she joined the company, she has worked closely with teams in Mountain View, New York, Zurich and Tel Aviv. She also actively contributes towards improving diversity and inclusion at Google through mentoring fellow female software engineers.

How would you explain your job to someone who isn’t in tech?

My job is to make sure computers can understand and interact with humans naturally, a field of computer science we call natural language processing (NLP). Our research has found that many Indian users tend to use a mix of English and their native language when interacting with our technology, so that’s why understanding natural language is so important—it’s key to localization, our efforts to provide our services in every language and culture—while making sure our technology is fun to use and natural-sounding along the way.

What are some of the biggest challenges you’re tackling in your work now?

The biggest challenge is that India is a multilingual country, with 22 official languages. I have seen friends, family and even strangers struggle with technology that doesn’t work for them in their language, even though it can work so well in other languages. 

Let’s say one of our users is a shop owner and lives in a small village in the southern Indian state of Telangana. She goes online for the first time with her phone. But since she has never used a computer or smartphone before, using her voice is the most natural way for her to interact with her phone. While she knows some English, she is also more comfortable speaking in her native language, Telugu. Our job is to make sure that she has a positive experience and does not have to struggle to get the information she needs. Perhaps she’s able to order more goods for her shop through the web, or maybe she decides to have her services listed online to grow her business. 

So that’s part of my motivation to do my research, and that’s one of Google’s AI Principles, too—to make sure our technology is socially beneficial. 

Speaking of the AI Principles, what other principles help inform your research?

Another one of Google’s AI Principles is avoiding creating or reinforcing unfair bias. AI systems are good at recognizing patterns within data. Given that most data that we feed into training an AI system is generated by humans, it tends to have human biases and prejudices. I look for systematic ways to remove these biases. This requires constant awareness: being aware of how people have different languages, backgrounds and financial statuses. Our society has people from the entire financial spectrum, from super rich to low-income, so what works on the most expensive phones might not work on lower-cost devices. Also, some of our users might not be able to read or write, so we need to provide some audio and visual tools for them to have a better internet experience.

What led you to this career and inspired you to join Google?  

I took an Introduction to Artificial Intelligence course as an undergraduate, and it piqued my interest and curiosity. That ultimately led to research on machine translation at the Indian Institute of Technology Bombay and then an advanced degree at the University of Southern California. After that, I spent some time working at U.S. startups that were using NLP and machine learning. 

But I wanted more. I wanted to be intellectually challenged, solving hard problems. Since Google had the computing power and reputation for solving problems at scale, it became one of my top choices for places to work. 

Now you’ve been at Google for over 12 years. What are some of the most rewarding moments of your career?

Definitely when I saw the quality improvements I worked on go live on Google Search and Assistant, positively impacting millions of people. I remember I was able to help launch local features like getting the Assistant to play the songs people wanted to hear. Playing music upon request makes people happy, and it’s a feature that still works today. 

Over the years, I have gone through difficult situations as someone from an underrepresented group. I was fortunate to have a great support network—women peers as well as allies—who helped me. I try to pay it forward by being a mentor for underrepresented groups both within and outside Google.

How should aspiring AI researchers prepare for a career in this field? 

First, be a lifelong learner: The industry is moving at a fast pace. It’s important to carve out time to keep yourself well-read about the latest research in your field as well as related fields.

Second, know your motivation: When a problem is super challenging and super hard, you need to have that focus and belief that what you’re doing is going to contribute positively to our society.

Read More

Just desserts: Baking with AI-made recipes

It’s winter, it’s the holidays and it’s quarantine-times: It’s the perfect recipe for doing a ton of baking. In fact, U.S. search interest in “baking” spiked in both November and December 2020.

But being in the AI field, we decided to dive a little deeper into the trend and 

try to understand the science behind what makes cookies crunchy, cake spongy and bread fluffy — and we decided to do it with the help of machine learning. Plus, we used our ML model to come up with two completely new baking recipes: a cakie (cake-cookie hybrid) and a breakie (bread-cookie hybrid). (Don’t worry, recipes included below.)

We started off by collecting hundreds of cookie, cake and bread recipes. Then we converted all of their ingredients to ounces and whittled them down to a few essential ingredients (yeast, flour, sugar, eggs, butter and a few other things). Next we did a bit of reorganizing, since according to Paul Hollywood, treats like banana, zucchini and pumpkin bread are really more cake than they are bread.

Then we used a Google Cloud tool called AutoML Tables to build a machine learning model that analyzed a recipe’s ingredient amounts and predicted whether it was a recipe for cookies, cake or bread. If you’ve never tried AutoML Tables, it’s a code-free way to build models from the type of data you’d find in a spreadsheet like numbers and categories – no data science background required. 

Our model was able to accurately tag breads, cookies and cakes, but could also identify recipes it deemed “hybrids” — something that’s, say, 50% cake and 50% bread, or something that’s 50% cake and 50% cookie. We named two such combinations the “breakie” (a bread-cookie — “brookie” was already taken) and the “cakie” (a cake-cookie) respectively. 

Being science-minded bakers, we had to experimentally verify if these hybrid treats could really be made. You know, for science.

Behold the cakie: It has the crispiness of a cookie and the, well, “cakiness” of a cake.

Image showing a cake-like cookie with a slice cut out of it.

We also made breakies, which were more like fluffy cookies, almost the consistency of a muffin.

Image showing a woman with dark brown hair looking into the camera while holding up a tray of puffy-looking cookies, which are actually bread-like cookies.

Sara’s first batch of breakies.

Beyond just generating recipes, we also used our model to understand what made the consistency of cookies, cakes and breads so different. For that, we used a metric called  “feature importance,” which is automatically calculated by AutoML Tables.

In our case, the amount of butter, sugar, yeast and egg in a recipe all seemed to be important indicators of “cookieness” (or cakiness or breadiness). AutoML Tables lets you look at feature importance both for your model as a whole and for individual predictions. Below are the most important features for our model as a whole, meaning these ingredients were the biggest signals for our model across many different cake, cookie and bread recipes:

A chart showing the feature importance of items like butter, sugar, yeast, egg, and so on in each of the recipes.

If you find yourself with extra time and an experimental spirit, try out our recipes and let us know what you think. And you can find all the details of what we learned from our ML model in the technical blog post.

A recipe card for a cakie.
A recipe card for a breakie.

Most importantly, if you come up with an even better cakie or breakie recipe, please let us know.

Read More

End-to-End, Transferable Deep RL for Graph Optimization

End-to-End, Transferable Deep RL for Graph Optimization

Posted by Yanqi Zhou and Sudip Roy, Research Scientists, Google Research

An increasing number of applications are driven by large and complex neural networks trained on diverse sets of accelerators. This process is facilitated by ML compilers that map high-level computational graphs to low-level, device-specific executables. In doing so, ML compilers need to solve many optimization problems, including graph rewriting, assignment of operations on devices, operation fusion, layout and tiling of tensors, and scheduling. For example, in a device placement problem, the compiler needs to determine the mapping between operations in the computational graph to the target physical devices so that an objective function, such as training step time, can be minimized. The placement performance is determined by a mixture of intricate factors, including inter-device network bandwidth, peak device memory, co-location constraints, etc., making it challenging for heuristics or search-based algorithms, which typically settle for fast, but sub-optimal, solutions. Furthermore, heuristics are hard to develop and maintain, especially as newer model architectures emerge.

Recent attempts at using learning-based approaches have demonstrated promising results, but they have a number of limitations that make them infeasible to be deployed in practice. Firstly, these approaches do not easily generalize to unseen graphs, especially those arising from newer model architectures, and second, they have poor sample efficiency, leading to high resource consumption during training. Finally, they are only able to solve a single optimization task, and consequently, do not capture the dependencies across the tightly coupled optimization problems in the compilation stack.

In “Transferable Graph Optimizers for ML Compilers”, recently published as an oral paper at NeurIPS 2020, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO) that overcomes all of the above limitations. We demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves an average 21% improvement over expert optimization and an 18% improvement over the prior state of the art with 15x faster convergence.

Graph Optimization Problems in ML Compilers
There are three coupled optimization tasks that frequently arise in ML compilers, which we formulate as decision problems that can be solved using a learned policy. The decision problems for each of the tasks can be reframed as making a decision for each node in the computational graph.

The first optimization task is device placement, where the goal is to determine how best to assign the nodes of the graph to the physical devices on which it runs such that the end-to-end run time is minimized.

The second optimization task is operation scheduling. An operation in a computational graph is ready to run when its incoming tensors are present in the device memory. A frequently used scheduling strategy is to maintain a ready queue of operations for each device and schedule operations in first-in-first-out order. However, this scheduling strategy does not take into account the downstream operations placed on other devices that might be blocked by an operation, and often leads to schedules with underutilized devices. To find schedules that can keep track of such cross-device dependencies, our approach uses a priority-based scheduling algorithm that schedules operations in the ready queue based on the priority of each. Similar to device placement, operation scheduling can then be formulated as the problem of learning a policy that assigns a priority for each node in the graph to maximize a reward based on run time.

The third optimization task is operation fusion. For brevity we omit a detailed discussion of this problem here, and instead just note that similar to priority-based scheduling, operation fusion can also use a priority-based algorithm to decide which nodes to fuse. The goal of the policy network in this case is again to assign a priority for each node in the graph.

Finally, it is important to recognize that the decisions taken in each of the three optimization problems can affect the optimal decision for the other problems. For example, placing two nodes on two different devices effectively disables fusion and introduces a communication delay that can influence scheduling.

RL Policy Network Architecture
Our research presents GO, a deep RL framework that can be adapted to solve each of the aforementioned optimization problems — both individually as well as jointly. There are three key aspects of the proposed architecture:

First, we use graph neural networks (specifically GraphSAGE) to capture the topological information encoded in the computational graph. The inductive network of GraphSAGE leverages node attribute information to generalize to previously unseen graphs, which enables decision making for unseen data without incurring significant cost on training.

Second, computational graphs for many models often contain more than 10k nodes. Solving the optimization problems effectively over such large scales requires that the network is able to capture long-range dependencies between nodes. GO’s architecture includes a scalable attention network that uses segment-level recurrence to capture such long-range node dependencies.

Third, ML compilers need to solve optimization problems over a wide variety of graphs from different application domains. A naive strategy of training a shared policy network with heterogeneous graphs is unlikely to capture the idiosyncrasies of a particular class of graphs. To overcome this, GO uses a feature modulation mechanism that allows the network to specialize for specific graph types without increasing the number of parameters.

Overview of GO: An end-to-end graph policy network that combines graph embedding and sequential attention.

To jointly solve multiple dependent optimization tasks, GO has the ability to add additional recurrent attention layers for each task with parameters shared across different tasks. The recurrent attention layers with residual connections of actions enables tracking inter-task dependencies.

Multi-task policy network that extends GO’s policy network with additional recurrent attention layers for each task and residual connections. GE: Graph Embedding, FC: Fully-Connected Layer, Nxf: fusion action dimension, Fxd: placement action dimension, Nxs: scheduling action dimension.

Results
Next, we present evaluation results on a single-task speedup on a device placement task based on real-hardware measurements, generalization to unseen graphs with different GO variants, and multi-task performance jointly optimizing operations fusion, device placement, and scheduling.

Speedup:
To evaluate the performance of this architecture, we apply GO to a device placement problem based on real-hardware evaluation, where we first train the model separately on each of our workloads. This approach, called GO-one, consistently outperforms expert manual placement (HP), TensorFlow METIS placement, and Hierarchical Device Placement (HDP) — the current state-of-the-art reinforcement learning-based device placement. Importantly, with the efficient end-to-end single-shot placement, GO-one has a 15x speedup in convergence time of the placement network over HDP.

Our empirical results show that GO-one consistently outperforms expert placement, TensorFlow METIS placement, and hierarchical device placement (HDP). Because GO is designed in a way to scale up to extremely large graphs consisting of over 80,000 nodes like an 8-layer Google Neural Machine Translation (GNMT) model, it outperforms previous approaches, including HDP, REGAL, and Placeto. GO achieves optimized graph runtimes for large graphs like GNMT that are 21.7% and 36.5% faster than HP and HDP, respectively. Overall, GO-one achieves on average 20.5% and 18.2% run time reduction across a diverse set of 14 graphs, compared to HP and HDP respectively. Importantly, with the efficient end-to-end single-shot placement, GO-one has a 15x speedup in convergence time of the placement network over HDP.

Generalization:
GO generalizes to unseen graphs using offline pre-training followed by fine-tuning on the unseen graphs. During pre-training, we train GO on heterogeneous subsets of graphs from the training set. We train GO for 1000 steps on each such batch of graphs before switching to the next. This pretrained model is then fine-tuned (GO-generalization+finetune) on hold-out graphs for fewer than 50 steps, which typically takes less than one minute. GO-generalization+finetune for hold-out graphs outperforms both expert placement and HDP consistently on all datasets, and on average matches GO-one.

We also run inference directly on just the pre-trained model without any fine-tuning for the target hold-out graphs, and name this GO-generalization-zeroshot. The performance of this untuned model is only marginally worse than GO-generalization+finetune, while being slightly better than expert placement and HDP. This indicates that both graph embedding and the learned policies transfer efficiently, allowing the model to generalize to the unseen data.

Generalization across heterogeneous workload graphs. The figure shows a comparison of two different generalization strategies for GO when trained with graphs from 5 (except the held-out one) of the 6 workloads (Inception-v3, AmoebaNet, recurrent neural network language model (RNNLM), Google Neural Machine Translation (GNMT), Transformer-XL (TRFXL), WaveNet), and evaluated on the held-out workload (x-axis).

Co-optimizing placement, scheduling, and fusion (pl+sch+fu):
Optimizing simultaneously for placement, scheduling and fusion provides 30%-73% speedup compared to the single-gpu unoptimized case and 33%-60% speedup compared to TensorFlow default placement, scheduling, and fusion. Comparing to optimizing each tasks individually, multi-task GO (pl+sch+fu) outperforms single-task GO (p | sch | fu) — optimizing all tasks, one at a time — by an average of 7.8%. Furthermore, for all workloads, co-optimizing all three tasks offers faster run time than optimizing any two of them and using the default policy for the third.

Run time for various workloads on multi-task optimizations. TF-default: TF GPU default placement, fusion, and scheduling. hp-only: human placement only with default scheduling and fusion. pl-only: GO placement only with default scheduling and fusion. pl | sch: GO optimizes placement and scheduling individually with default fusion. pl+sch: multi-task GO co-optimizes placement and scheduling with default fusion. sch+fu: multi-task GO co-optimizes scheduling and fusion with human placement. pl | sch | fu: GO optimizes placement, scheduling, and fusion separately. pl+sch+fu: multi-task GO co-optimizes placement, scheduling, and fusion.

Conclusion
The increasing complexity and diversity of hardware accelerators has made the development of robust and adaptable ML frameworks onerous and time-consuming, often requiring multiple years of effort from hundreds of engineers. In this article, we demonstrated that many of the optimization problems in such frameworks can be solved efficiently and optimally using a carefully designed learned approach.

Acknowledgements
This is joint work with Daniel Wong, Amirali Abdolrashidi, Peter Ma, Qiumin Xu, Hanxiao Liu, Mangpo Phitchaya Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, and James Laudon.

Read More

Privacy Considerations in Large Language Models

Privacy Considerations in Large Language Models

Posted by Nicholas Carlini, Research Scientist, Google Research

Machine learning-based language models trained to predict the next word in a sentence have become increasingly capable, common, and useful, leading to groundbreaking improvements in applications like question-answering, translation, and more. But as language models continue to advance, new and unexpected risks can be exposed, requiring the research community to proactively work to develop new ways to mitigate potential problems.

One such risk is the potential for models to leak details from the data on which they’re trained. While this may be a concern for all large language models, additional issues may arise if a model trained on private data were to be made publicly available. Because these datasets can be large (hundreds of gigabytes) and pull from a range of sources, they can sometimes contain sensitive data, including personally identifiable information (PII) — names, phone numbers, addresses, etc., even if trained on public data. This raises the possibility that a model trained using such data could reflect some of these private details in its output. It is therefore important to identify and minimize the risks of such leaks, and to develop strategies to address the issue for future models.

If one prompts the GPT-2 language model with the prefix “East Stroudsburg Stroudsburg…”, it will autocomplete a long block of text that contains the full name, phone number, email address, and physical address of a particular person whose information was included in GPT-2’s training data.

In “Extracting Training Data from Large Language Models”, a collaboration with OpenAI, Apple, Stanford, Berkeley, and Northeastern University, we demonstrate that, given only the ability to query a pre-trained language model, it is possible to extract specific pieces of training data that the model has memorized. As such, training data extraction attacks are realistic threats on state-of-the-art large language models. This research represents an early, critical step intended to inform researchers about this class of vulnerabilities, so that they may take steps to mitigate these weaknesses.

Ethics of Language Model Attacks
A training data extraction attack has the greatest potential for harm when applied to a model that is available to the public, but for which the dataset used in training is not. However, since conducting this research on such a dataset could have harmful consequences, we instead mount a proof of concept training data extraction attack on GPT-2, a large, publicly available language model developed by OpenAI, that was trained using only public data. While this work focuses on GPT-2 specifically, the results apply to understanding what privacy threats are possible on large language models generally.

As with other privacy- and security-related research, it is important to consider the ethics of such attacks before actually performing them. To minimize the potential risk of this work, the training data extraction attack in this work was developed using publicly available data. Furthermore, the GPT-2 model itself was made public by OpenAI in 2019, and the training data used to train GPT-2 was collected from the public internet, and is available for download by anyone who follows the data collection process documented in the GPT-2 paper.

Additionally, in accordance with responsible computer security disclosure norms, we followed up with individuals whose PII was extracted, and secured their permission before including references to this data in publication. Further, in all publications of this work, we have redacted any personally identifying information that may identify individuals. We have also worked closely with OpenAI in the analysis of GPT-2.

The Training Data Extraction Attack
By design, language models make it very easy to generate a large amount of output data. By seeding the model with random short phrases, the model can generate millions of continuations, i.e., probable phrases that complete the sentence. Most of the time, these continuations will be benign strings of sensible text. For example, when asked to predict the continuation of the string “Mary had a little…”, a language model will have high confidence that the next token is the word “lamb”. However, if one particular training document happened to repeat the string “Mary had a little wombat” many times, the model might predict that phrase instead.

The goal of a training data extraction attack is then to sift through the millions of output sequences from the language model and predict which text is memorized. To accomplish this, our approach leverages the fact that models tend to be more confident on results captured directly from their training data. These membership inference attacks enable us to predict if a result was used in the training data by checking the confidence of the model on a particular sequence.

The main technical contribution of this work is the development of a method for inferring membership with high accuracy along with techniques for sampling from models in a way that encourages the output of memorized content. We tested a number of different sampling strategies, the most successful of which generates text conditioned on a wide variety of input phrases. We then compare the output of two different language models. When one model has high confidence in a sequence, but the other (equally accurate) model has low confidence in a sequence, it’s likely that the first model has memorized the data.

Results
Out of 1800 candidate sequences from the GPT-2 language model, we extracted over 600 that were memorized from the public training data, with the total number limited by the need for manual verification. The memorized examples cover a wide range of content, including news headlines, log messages, JavaScript code, PII, and more. Many of these examples are memorized even though they appear infrequently in the training dataset. For example, for many samples of PII we extract are found in only a single document in the dataset. However, in most of these cases, the originating document contains multiple instances of the PII, and as a result, the model still learns it as high likelihood text.

Finally, we also find that the larger the language model, the more easily it memorizes training data. For example, in one experiment we find that the 1.5 billion parameter GPT-2 XL model memorizes 10 times more information than the 124 million parameter GPT-2 Small model. Given that the research community has already trained models 10 to 100 times larger, this means that as time goes by, more work will be required to monitor and mitigate this problem in increasingly large language models.

Lessons
While we demonstrate these attacks on GPT-2 specifically, they show potential flaws in all large generative language models. The fact that these attacks are possible has important consequences for the future of machine learning research using these types of models.

Fortunately, there are several ways to mitigate this issue. The most straightforward solution is to ensure that models do not train on any potentially problematic data. But this can be difficult to do in practice.

The use of differential privacy, which allows training on a dataset without revealing any details of individual training examples, is one of the most principled techniques to train machine learning models with privacy. In TensorFlow, this can be achieved with the use of the tensorflow/privacy module (or similar for PyTorch or JAX) that is a drop-in replacement for existing optimizers. Even this can have limitations and won’t prevent memorization of content that is repeated often enough. If this is not possible, we recommend at least measuring how much memorization occurs so appropriate action can be taken.

Language models continue to demonstrate great utility and flexibility—yet, like all innovations, they can also pose risks. Developing them responsibly means proactively identifying those risks and developing ways to mitigate them. We hope that this effort to highlight current weaknesses in large language modeling will raise awareness of this challenge in the broader machine learning community and motivate researchers to continue to develop effective techniques to train models with reduced memorization.

Acknowledgements
This work was performed jointly with Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel.

Read More

AI helps protect Australian wildlife in fire-affected areas

AI helps protect Australian wildlife in fire-affected areas

Over the next six months, more than 600 sensor cameras will be deployed in bushfire-affected areas across Australia, monitoring and evaluating the surviving wildlife populations. This nationwide effort is part of An Eye on Recovery, a large-scale collaborative camera sensor project, run by the World Wide Fund for Nature (WWF) and Conservation International, with the support of a $1 million grant from Google.org. Using Wildlife Insights, a platform powered by Google’s Artificial Intelligence technology, researchers across the country will upload and share sensor camera photos to give a clearer picture of how Australian wildlife is coping after the devastating bushfires in the past year.  

Why is this important? 

For many Aussies, the horror of last summer’s fires is still very raw and real. Up to 19 million hectares were burned (more than 73,000 square miles), with 12.6 million hectares primarily forest and bushland. Thirty-three lives were lost and 3,094 homes destroyed. And the wildlife toll? A staggering three billion animals were estimated to have been impacted by the flames. 

Australian bushfire devastation

The scale of the damage is so severe that one year on—as we prepare for the next bushfire season—WWF and scientists are still in the field conducting ecological assessments. Our findings have been sobering. Nearly 61,000 koalas, Australia’s most beloved marsupial, are estimated to have been killed or impacted. Over 300 threatened species were affected, pushing more of our precious wildlife on the fast-track towards extinction.

Hope will prevail

In November, I travelled to Kangaroo Island in South Australia to place the first 100 of the sensor cameras in bushfire-ravaged areas. Though much of the native cover has been decimated by the flames, the island’s wildlife has shown signs of recovery. 

One animal at risk from the flames is the Kangaroo Island dunnart, an adorable, grey-coloured, nocturnal marsupial so elusive that a researcher told WWF that she’d never seen one in the field. We were fortunate to capture this creature of the night on one of our cameras.

Kangaroo Island dunnart

Thou art a dunnart.

But if I hadn’t told you that was a dunnart, you might have thought it was a mouse. And as anyone with thousands of holiday photos will tell you, sorting and organizing heaps of camera pictures and footage can be labor-intensive and time-consuming. Analyzing camera sensor pictures traditionally requires expertise to determine the best pictures (and which ones you can just delete), and you can get hundreds of empty images before you strike gold.


How AI can help 

With the Wildlife Insights platform, we can now identify over 700 species of wildlife in seconds and quickly discard empty images, taking the tedium out of the process and helping scientists and ecologists make better and more informed data assessments.  

The platform will help us identify wildlife in landscapes impacted by last summer’s bushfires, including the Blue Mountains, East Gippsland, South East Queensland, and of course Kangaroo Island. We’re particularly keen to see species like the Hastings River Mouse, a native rodent that was already endangered before fire tore through its habitat in northern New South Wales, and the brush-tailed rock-wallaby, which lost vital habitat and food to blazes in the Blue Mountains.

These images will help us to understand what species have survived in bushfire zones and determine where recovery actions are needed most.

Checking camera traps

WWF-Australia / Slavica Miskovich

Join us to safeguard species

The platform is still growing, and the more images we feed it, the better it will get at recognizing different types of animals. While we’re already rolling out hundreds of sensor cameras across the country, we are calling for more images—and asking Australians to help. If you have any sensor camera footage, please get in touch with us. We’re looking for images specifically from sensor cameras  placed in animal’s habitats, rather than wildlife photography (as beautiful as these pictures may be). 

With your help, we can help safeguard species such as the Kangaroo Island dunnart, marvel at their bright beaming eyes on film, and protect their environment on the ground–so future generations can continue to enjoy the richness of Australia’s wildlife.

Read More

Portrait Light: Enhancing Portrait Lighting with Machine Learning

Portrait Light: Enhancing Portrait Lighting with Machine Learning

Posted by Yun-Ta Tsai1 and Rohit Pandey, Software Engineers, Google Research

Professional portrait photographers are able to create compelling photographs by using specialized equipment, such as off-camera flashes and reflectors, and expert knowledge to capture just the right illumination of their subjects. In order to allow users to better emulate professional-looking portraits, we recently released Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits, with the directionality and intensity set to complement the lighting from the original photograph.

Example image with and without Portrait Light applied. Note how Portrait Light contours the face, adding dimensionality, volume, and visual interest.

In the Pixel Camera on Pixel 4, Pixel 4a, Pixel 4a (5G), and Pixel 5, Portrait Light is automatically applied post-capture to images in the default mode and to Night Sight photos that include people — just one person or even a small group. In Portrait Mode photographs, Portrait Light provides more dramatic lighting to accompany the shallow depth-of-field effect already applied, resulting in a studio-quality look. But because lighting can be a personal choice, Pixel users who shoot in Portrait Mode can manually re-position and adjust the brightness of the applied lighting within Google Photos to match their preference. For those running Google Photos on Pixel 2 or newer, this relighting capability is also available for many pre-existing portrait photographs.

Pixel users can adjust a portrait’s lighting as they like in Google Photos, after capture.

Today we present the technology behind Portrait Light. Inspired by the off-camera lights used by portrait photographers, Portrait Light models a repositionable light source that can be added into the scene, with the initial lighting direction and intensity automatically selected to complement the existing lighting in the photo. We accomplish this by leveraging novel machine learning models, each trained using a diverse dataset of photographs captured in the Light Stage computational illumination system. These models enabled two new algorithmic capabilities:

  1. Automatic directional light placement: For a given portrait, the algorithm places a synthetic directional light in the scene consistent with how a photographer would have placed an off-camera light source in the real world.
  2. Synthetic post-capture relighting: For a given lighting direction and portrait, synthetic light is added in a way that looks realistic and natural.

These innovations enable Portrait Light to help create attractive lighting at any moment for every portrait — all on your mobile device.

Automatic Light Placement
Photographers usually rely on perceptual cues when deciding how to augment environmental illumination with off-camera light sources. They assess the intensity and directionality of the light falling on the face, and also adjust their subject’s head pose to complement it. To inform Portrait Light’s automatic light placement, we developed computational equivalents to these two perceptual signals.

First, we trained a novel machine learning model to estimate a high dynamic range, omnidirectional illumination profile for a scene based on an input portrait. This new lighting estimation model infers the direction, relative intensity, and color of all light sources in the scene coming from all directions, considering the face as a light probe. We also estimate the head pose of the portrait’s subject using MediaPipe Face Mesh.

Estimating the high dynamic range, omnidirectional illumination profile from an input portrait. The three spheres at the right of each image, diffuse (top), matte silver (middle), and mirror (bottom), are rendered using the estimated illumination, each reflecting the color, intensity, and directionality of the environmental lighting.

Using these clues, we determine the direction from which the synthetic lighting should originate. In studio portrait photography, the main off-camera light source, or key light, is placed about 30° above the eyeline and between 30° and 60° off the camera axis, when looking overhead at the scene. We follow this guideline for a classic portrait look, enhancing any pre-existing lighting directionality in the scene while targeting a balanced, subtle key-to-fill lighting ratio of about 2:1.

Data-Driven Portrait Relighting
Given a desired lighting direction and portrait, we next trained a new machine learning model to add the illumination from a directional light source to the original photograph. Training the model required millions of pairs of portraits both with and without extra light. Photographing such a dataset in normal settings would have been impossible because it requires near-perfect registration of portraits captured across different lighting conditions.

Instead, we generated training data by photographing seventy different people using the Light Stage computational illumination system. This spherical lighting rig includes 64 cameras with different viewpoints and 331 individually-programmable LED light sources. We photographed each individual illuminated one-light-at-a-time (OLAT) by each light, which generates their reflectance field — or their appearance as illuminated by the discrete sections of the spherical environment. The reflectance field encodes the unique color and light-reflecting properties of the subject’s skin, hair, and clothing — how shiny or dull each material appears. Due to the superposition principle for light, these OLAT images can then be linearly added together to render realistic images of the subject as they would appear in any image-based lighting environment, with complex light transport phenomena like subsurface scattering correctly represented.

Using the Light Stage, we photographed many individuals with different face shapes, genders, skin tones, hairstyles, and clothing/accessories. For each person, we generated synthetic portraits in many different lighting environments, both with and without the added directional light, rendering millions of pairs of images. This dataset encouraged model performance across diverse lighting environments and individuals.

Photographing an individual as illuminated one-light-at-a-time in the Google Light Stage, a 360° computational illumination rig.
Left: Example images from an individual’s photographed reflectance field, their appearance in the Light Stage as illuminated one-light-at-a-time. Right: The images can be added together to form the appearance of the subject in any novel lighting environment.

Learning Detail-Preserving Relighting Using the Quotient Image
Rather than trying to directly predict the output relit image, we trained the relighting model to output a low-resolution quotient image, i.e., a per-pixel multiplier that when upsampled can be applied to the original input image to produce the desired output image with the contribution of the extra light source added. This technique is computationally efficient and encourages only low-frequency lighting changes, without impacting high-frequency image details, which are directly transferred from the input to maintain image quality.

Supervising Relighting with Geometry Estimation
When photographers add an extra light source into a scene, its orientation relative to the subject’s facial geometry determines how much brighter each part of the face appears. To model the optical behavior of light sources reflecting off relatively matte surfaces, we first trained a machine learning model to estimate surface normals given the input photograph, and then applied Lambert’s law to compute a “light visibility map” for the desired lighting direction. We provided this light visibility map as input to the quotient image predictor, ensuring that the model is trained using physics-based insights.

The pipeline of our relighting network. Given an input portrait, we estimate per-pixel surface normals, which we then use to compute a light visibility map. The model is trained to produce a low-resolution quotient image that, when upsampled and applied as a multiplier to the original image, produces the original portrait with an extra light source added synthetically into the scene.

We optimized the full pipeline to run at interactive frame-rates on mobile devices, with total model size under 10 MB. Here are a few examples of Portrait Light in action.

Portrait Light in action.

Getting the Most Out of Portrait Light
You can try Portrait Light in the Pixel Camera and change the light position and brightness to your liking in Google Photos. For those who use Dual Exposure Controls, Portrait Light can be applied post-capture for additional creative flexibility to find just the right balance between light and shadow. On existing images from your Google Photos library, try it where faces are slightly underexposed, where Portrait Light can illuminate and highlight your subject. It will especially benefit images with a single individual posed directly at the camera.

We see Portrait Light as the first step on the journey towards creative post-capture lighting controls for mobile cameras, powered by machine learning.

Acknowledgements
Portrait Light is the result of a collaboration between Google Research, Google Daydream, Pixel, and Google Photos teams. Key contributors include: Yun-Ta Tsai, Rohit Pandey, Sean Fanello, Chloe LeGendre, Michael Milne, Ryan Geiss, Sam Hasinoff, Dillon Sharlet, Christoph Rhemann, Peter Denny, Kaiwen Guo, Philip Davidson, Jonathan Taylor, Mingsong Dou, Pavel Pidlypenskyi, Peter Lincoln, Jay Busch, Matt Whalen, Jason Dourgarian, Geoff Harvey, Cynthia Herrera, Sergio Orts Escolano, Paul Debevec, Jonathan Barron, Sofien Bouaziz, Clement Ng, Rachit Gupta, Jesse Evans, Ryan Campbell, Sonya Mollinger, Emily To, Yichang Shih, Jana Ehmann, Wan-Chun Alex Ma, Christina Tong, Tim Smith, Tim Ruddick, Bill Strathearn, Jose Lima, Chia-Kai Liang, David Salesin, Shahram Izadi, Navin Sarma, Nisha Masharani, Zachary Senzer.


1  Work conducted while at Google. 

Read More

MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device

MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device

Posted by Ivan Grishchenko and Valentin Bazarevsky, Research Engineers, Google Research

Real-time, simultaneous perception of human pose, face landmarks and hand tracking on mobile devices can enable a variety of impactful applications, such as fitness and sport analysis, gesture control and sign language recognition, augmented reality effects and more. MediaPipe, an open-source framework designed specifically for complex perception pipelines leveraging accelerated inference (e.g., GPU or CPU), already offers fast and accurate, yet separate, solutions for these tasks. Combining them all in real-time into a semantically consistent end-to-end solution is a uniquely difficult problem requiring simultaneous inference of multiple, dependent neural networks.

Today, we are excited to announce MediaPipe Holistic, a solution to this challenge that provides a novel state-of-the-art human pose topology that unlocks novel use cases. MediaPipe Holistic consists of a new pipeline with optimized pose, face and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs. When including all three components, MediaPipe Holistic provides a unified topology for a groundbreaking 540+ keypoints (33 pose, 21 per-hand and 468 facial landmarks) and achieves near real-time performance on mobile devices. MediaPipe Holistic is being released as part of MediaPipe and is available on-device for mobile (Android, iOS) and desktop. We are also introducing MediaPipe’s new ready-to-use APIs for research (Python) and web (JavaScript) to ease access to the technology.

Top: MediaPipe Holistic results on sport and dance use-cases. Bottom: “Silence” and “Hello” gestures. Note, that our solution consistently identifies a hand as either right (blue color) or left (orange color).

Pipeline and Quality
The MediaPipe Holistic pipeline integrates separate models for pose, face and hand components, each of which are optimized for their particular domain. However, because of their different specializations, the input to one component is not well-suited for the others. The pose estimation model, for example, takes a lower, fixed resolution video frame (256×256) as input. But if one were to crop the hand and face regions from that image to pass to their respective models, the image resolution would be too low for accurate articulation. Therefore, we designed MediaPipe Holistic as a multi-stage pipeline, which treats the different regions using a region appropriate image resolution.

First, MediaPipe Holistic estimates the human pose with BlazePose’s pose detector and subsequent keypoint model. Then, using the inferred pose key points, it derives three regions of interest (ROI) crops for each hand (2x) and the face, and employs a re-crop model to improve the ROI (details below). The pipeline then crops the full-resolution input frame to these ROIs and applies task-specific face and hand models to estimate their corresponding keypoints. Finally, all key points are merged with those of the pose model to yield the full 540+ keypoints.

MediaPipe Holistic pipeline overview.

To streamline the identification of ROIs, a tracking approach similar to the one used for the standalone face and hand pipelines is utilized. This approach assumes that the object doesn’t move significantly between frames, using an estimation from the previous frame as a guide to the object region in the current one. However, during fast movements, the tracker can lose the target, which requires the detector to re-localize it in the image. MediaPipe Holistic uses pose prediction (on every frame) as an additional ROI prior to reduce the response time of the pipeline when reacting to fast movements. This also enables the model to retain semantic consistency across the body and its parts by preventing a mixup between left and right hands or body parts of one person in the frame with another.

In addition, the resolution of the input frame to the pose model is low enough that the resulting ROIs for face and hands are still too inaccurate to guide the re-cropping of those regions, which require a precise input crop to remain lightweight. To close this accuracy gap we use lightweight face and hand re-crop models that play the role of spatial transformers and cost only ~10% of the corresponding model’s inference time.

 MEH   FLE 
 Tracking pipeline (baseline)   9.8%   3.1% 
 Pipeline without re-crops   11.8%   3.5% 
 Pipeline with re-crops   9.7%   3.1% 
Hand prediction quality.The mean error per hand (MEH) is normalized by the hand size. The face landmarks error (FLE) is normalized by the inter-pupillary distance.

Performance
MediaPipe Holistic requires coordination between up to 8 models per frame — 1 pose detector, 1 pose landmark model, 3 re-crop models and 3 keypoint models for hands and face. While building this solution, we optimized not only machine learning models, but also pre- and post-processing algorithms (e.g., affine transformations), which take significant time on most devices due to pipeline complexity. In this case, moving all the pre-processing computations to GPU resulted in ~1.5 times overall pipeline speedup depending on the device. As a result, MediaPipe Holistic runs in near real-time performance even on mid-tier devices and in the browser.

 Phone   FPS 
 Google Pixel 2 XL   18 
 Samsung S9+   20 
 15-inch MacBook Pro 2017   15 
Performance on various mid-tier devices, measured in frames per second (FPS) using TFLite GPU.

The multi-stage nature of the pipeline provides two more performance benefits. As models are mostly independent, they can be replaced with lighter or heavier versions (or turned off completely) depending on the performance and accuracy requirements. Also, once pose is inferred, one knows precisely whether hands and face are within the frame bounds, allowing the pipeline to skip inference on those body parts.

Applications
MediaPipe Holistic, with its 540+ key points, aims to enable a holistic, simultaneous perception of body language, gesture and facial expressions. Its blended approach enables remote gesture interfaces, as well as full-body AR, sports analytics, and sign language recognition. To demonstrate the quality and performance of the MediaPipe Holistic, we built a simple remote control interface that runs locally in the browser and enables a compelling user interaction, no mouse or keyboard required. The user can manipulate objects on the screen, type on a virtual keyboard while sitting on the sofa, and point to or touch specific face regions (e.g., mute or turn off the camera). Underneath it relies on accurate hand detection with subsequent gesture recognition mapped to a “trackpad” space anchored to the user’s shoulder, enabling remote control from up to 4 meters.

This technique for gesture control can unlock various novel use-cases when other human-computer interaction modalities are not convenient. Try it out in our web demo and prototype your own ideas with it.

In-browser touchless control demos. Left: Palm picker, touch interface, keyboard. Right: Distant touchless keyboard. Try it out!

MediaPipe for Research and Web
To accelerate ML research as well as its adoption in the web developer community, MediaPipe now offers ready-to-use, yet customizable ML solutions in Python and in JavaScript. We are starting with those in our previous publications: Face Mesh, Hands and Pose, including MediaPipe Holistic, with many more to come. Try them directly in the web browser: for Python using the notebooks in MediaPipe on Google Colab, and for JavaScript with your own webcam input in MediaPipe on CodePen!

Conclusion
We hope the release of MediaPipe Holistic will inspire the research and development community members to build new unique applications. We anticipate that these pipelines will open up avenues for future research into challenging domains, such as sign-language recognition, touchless control interfaces, or other complex use cases. We are looking forward to seeing what you can build with it!

Complex and dynamic hand gestures. Videos by Dr. Bill Vicars, used with permission.

Acknowledgments
Special thanks to all our team members who worked on the tech with us: Fan Zhang, Gregory Karpiak, Kanstantsin Sokal, Juhyun Lee, Hadon Nash, Chuo-Ling Chang, Jiuqiang Tang, Nikolay Chirkov, Camillo Lugaresi, George Sung, Michael Hays, Tyler Mullen, Chris McClanahan, Ekaterina Ignasheva, Marat Dukhan, Artsiom Ablavatski, Yury Kartynnik, Karthik Raveendran, Andrei Vakunov, Andrei Tkachenka, Suril Shah, Buck Bourdon, Ming Guang Yong, Esha Uboweja, Siarhei Kazakou, Andrei Kulik, Matsvei Zhdanovich, and Matthias Grundmann.

Read More

Google at NeurIPS 2020

Google at NeurIPS 2020

Posted by Jaqui Herman and Cat Armato, Program Managers

This week marks the beginning of the 34th annual Conference on Neural Information Processing Systems (NeurIPS 2020), the biggest machine learning conference of the year. Held virtually for the first time, this conference includes invited talks, demonstrations and presentations of some of the latest in machine learning research. As a Platinum Sponsor of NeurIPS 2020, Google will have a strong presence with more than 180 accepted papers, additionally contributing to and learning from the broader academic research community via talks, posters, workshops and tutorials.

If you are registered for NeurIPS 2020, we hope you’ll visit our virtual booth and chat with our researchers about the projects and opportunities at Google that go into solving the world’s most challenging research problems, and to see demonstrations of some of the exciting research we pursue, such as Transformers for image recognition, Tone Transfer, large-scale distributed RL, recreating historical streetscapes and much more. You can also learn more about our work being presented in the list below (Google affiliations highlighted in blue).


Organizing Committees

General Chair: Hugo Larochelle

Workshop Co-Chair: Sanmi Koyejo

Diversity and Inclusion Chairs include: Katherine Heller

Expo Chair: Pablo Samuel Castro

Senior Area Chairs include: Corinna Cortes, Fei Sha, Mohammad Ghavamzadeh, Sanjiv Kumar, Charles Sutton, Dale Schuurmans, David Duvenaud, Elad Hazan, Marco Cuturi, Peter Bartlett, Samy Bengio, Tong Zhang, Claudio Gentile, Kevin Murphy, Cordelia Schmid, Amir Globerson

Area Chairs include: Boqing Gong, Afshin Rostamizadeh, Alex Kulesza, Branislav Kveton, Craig Boutilier, Heinrich Jiang, Manzil Zaheer, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Rodolphe Jenatton, Mathieu Blondel, Aleksandra Faust, Alexey Dosovitskiy, Ashish Vaswani, Augustus Odena, Balaji Lakshminarayanan, Ben Poole, Colin Raffel, Danny Tarlow, David Ha, Denny Zhou, Dumitru Erhan, Dustin Tran, George Tucker, Honglak Lee, Ilya Tolstikhin, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Kevin Swersky, Matthew Johnson, Minmin Chen, Mohammad Norouzi, Moustapha Cisse, Naman Agarwal, Nicholas Carlini, Olivier Bachem, Tim Salimans, Vincent Dumoulin, Yann Dauphin, Andrew Dai, Izhak Shafran, Karthik Sridharan, Abhinav Gupta, Abhishek Kumar, Adam White, Aditya Menon, Kun Zhang, Ce Liu, Cristian Sminchisescu, Hossein Mobahi, Phillip IsolaTomer Koren, Chelsea Finn, Amin Karbasi

NeurIPS 2020 Foundation Board includes: Michael Mozer, Samy Bengio, Corinna Cortes, Hugo Larochelle, John C. Platt, Fernando Pereira


Accepted Papers

Rankmax: An Adaptive Projection Alternative to the Softmax Function
Weiwei Kong*, Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang

Unsupervised Sound Separation Using Mixture Invariant Training
Scott Wisdom, Efthymios Tzinis*, Hakan Erdogan, Ron Weiss, Kevin Wilson, John Hershey

Learning to Select Best Forecast Tasks for Clinical Outcome Prediction
Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai

Interpretable Sequence Learning for Covid-19 Forecasting
Sercan O. Arık, Chun-Liang Li, Jinsung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T. Le, Vikas Menon, Shashank Singh, Leyou Zhang, Nate Yoder, Martin Nikoltchev, Yash Sonthalia, Hootan Nakhost, Elli Kanal, Tomas Pfister

Towards Learning Convolutions from Scratch
Behnam Neyshabur

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
Minhae Kwon, Saurabh Daptardar, Paul Schrater, Xaq Pitkow

Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

Unsupervised Data Augmentation for Consistency Training
Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai, Guokun Lai, Yiming Yang, Quoc Le

Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Zhaoran Wang, Mladen Kolar

Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

MOReL: Model-Based Offline Reinforcement Learning
Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas

Generative View Synthesis: From Single-view Semantics to Novel-view Images
Tewodros Habtegebrial, Varun Jampani, Orazio Gallo, Didier Stricker

PIE-NET: Parametric Inference of Point Cloud Edges
Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, Hao Zhang

Enabling Certification of Verification-Agnostic Networks via Memory-Efficient Semidefinite Programming
Sumanth Dathathri, Krishnamurthy (Dj) Dvijotham, Alex Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow*, Percy Liang, Pushmeet Kohli

An Analysis of SVD for Deep Rotation Estimation
Jake Levinson, Carlos Esteves, Kefan Chen, Noah Snavely, Angjoo Kanazawa, Afshin Rostamizadeh, Ameesh Makadia

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Guy Lorberbom, Chris J. Maddison, Nicolas Heess, Tamir Hazan, Daniel Tarlow

Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC
Arun Ganesh*, Kunal Talwar*

DISK: Learning Local Features with Policy Gradient
Michał J. Tyszkiewicz, Pascal Fua, Eduard Trulls

Robust Large-margin Learning in Hyperbolic Space
Melanie Weber*, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
Michael Janner, Igor Mordatch, Sergey Levine

Adversarially Robust Streaming Algorithms via Differential Privacy
Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer

Faster DBSCAN via Subsampled Similarity Queries
Heinrich Jiang, Jennifer Jang, Jakub Łacki

Exact Recovery of Mangled Clusters with Same-Cluster Queries
Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Görür, Chris Harris, Dale Schuurmans

Fairness in Streaming Submodular Maximization: Algorithms and Hardness
Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

Efficient Active Learning of Sparse Halfspaces with Arbitrary Bounded Noise
Chicheng Zhang, Jie Shen, Pranjal Awasthi

Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia

Synthetic Data Generators — Sequential and Private
Olivier Bousquet, Roi Livni, Shay Moran

Learning Discrete Distributions: User vs Item-level Privacy
Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X. Yu, Sanjiv Kumar, Michael Riley

Learning Differential Equations that are Easy to Solve
Jacob Kelly, Jesse Bettencourt, Matthew J. Johnson, David K. Duvenaud

An Optimal Elimination Algorithm for Learning a Best Arm
Avinatan Hassidim, Ron Kupfer, Yaron Singer

The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification
Christian Tjandraatmadja, Ross Anderson, Joey Huchette, Will Ma, Krunal Kishor Patel*, Juan Pablo Vielma

Escaping the Gravitational Pull of Softmax
Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li*, Csaba Szepesvari, Dale Schuurmans

The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise
Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi

PAC-Bayes Learning Bounds for Sample-Dependent Priors
Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Sarah Perrin, Julien Perolat, Mathieu Lauriere, Matthieu Geist, Romuald Elie, Olivier Pietquin

What Do Neural Networks Learn When Trained With Random Labels?
Hartmut Maennel, Ibrahim M. Alabdulmohsin, Ilya O. Tolstikhin, Robert Baldock*, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

Online Planning with Lookahead Policies
Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

Smoothly Bounding User Contributions in Differential Privacy
Alessandro Epasto, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni, Lijie Ren

Differentially Private Clustering: Tight Approximation Ratios
Badih Ghazi, Ravi Kumar, Pasin Manurangsi

Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics
Aranyak Mehta, Uri Nadav, Alexandros Psomas*, Aviad Rubinstein

Myersonian Regression
Allen Liu, Renato Leme, Jon Schneider

Assisted Learning: A Framework for Multi-Organization Learning
Xun Xian, Xinran Wang, Jie Ding, Reza Ghanadan

Adversarial Robustness via Robust Low Rank Representations
Pranjal Awasthi, Himanshu Jain, Ankit Singh Rawat, Aravindan Vijayaraghavan

Multi-Plane Program Induction with 3D Box Priors
Yikai Li, Jiayuan Mao, Xiuming Zhang, Bill Freeman, Josh Tenenbaum, Noah Snavely, Jiajun Wu

Privacy Amplification via Random Check-Ins
Borja Balle, Peter Kairouz, Brendan McMahan, Om Dipakbhai Thakkar, Abhradeep Thakurta

Rethinking Pre-training and Self-training
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Dogus Cubuk, Quoc Le

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Arthur Delarue, Ross Anderson, Christian Tjandraatmadja

Online Agnostic Boosting via Regret Minimization
Nataly Brukhim, Xinyi Chen, Elad Hazan, Shay Moran*

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré

Faithful Embeddings for Knowledge Base Queries
Haitian Sun, Andrew Arnold*, Tania Bedrax Weiss, Fernando Pereira, William W. Cohen

Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming
Joey Huchette, Haihao Lu, Hossein Esfandiari, Vahab Mirrokni

An Operator View of Policy Gradient Methods
Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux

Reinforcement Learning with Feedback Graphs
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
Benjamin Eysenbach, Xinyang Geng, Sergey Levine, Ruslan Salakhutdinov

The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space
Adam Smith, Shuang Song, Abhradeep Thakurta

What is Being Transferred in Transfer Learning?
Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang

Latent Bandits Revisited
Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier

MetaSDF: Meta-Learning Signed Distance Functions
Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, Gordon Wetzstein

Measuring Robustness to Natural Distribution Shifts in Image Classification
Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt

Robust Optimization for Fairness with Noisy Protected Groups
Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Michael I. Jordan

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans

Breaking the Communication-Privacy-Accuracy Trilemma
Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur

Differentiable Meta-Learning of Bandit Policies
Craig Boutilier, Chih-wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Multi-Stage Influence Function
Hongge Chen*, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

Compositional Visual Generation with Energy Based Models
Yilun Du, Shuang Li, Igor Mordatch

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Curriculum By Smoothing
Samarth Sinha, Animesh Garg, Hugo Larochelle

Online Linear Optimization with Many Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Prediction with Corrupted Expert Advice
Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour

Agnostic Learning with Multiple Objectives
Corinna Cortes, Mehryar Mohri, Javier Gonzalvo, Dmitry Storcheus

CoSE: Compositional Stroke Embeddings
Emre Aksan, Thomas Deselaers*, Andrea Tagliasacchi, Otmar Hilliges

Reparameterizing Mirror Descent as Gradient Descent
Ehsan Amid, Manfred K. Warmuth

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
Ben Adlam, Jeffrey Pennington

DisARM: An Antithetic Gradient Estimator for Binary Latent Variables
Zhe Dong, Andriy Mnih, George Tucker

Big Self-Supervised Models are Strong Semi-Supervised Learners
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

JAX MD: A Framework for Differentiable Physics
Samuel S. Schoenholz, Ekin D. Cubuk

Gradient Surgery for Multi-Task Learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn

LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
Ilyes Khemakhem, Ricardo P. Monti, Diederik P. Kingma, Aapo Hyvärinen

Demystifying Orthogonal Monte Carlo and Beyond
Han Lin, Haoxian Chen, Tianyi Zhang, Clement Laroche, Krzysztof Choromanski

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel

Compositional Generalization via Neural-Symbolic Stack Machines
Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou

Universally Quantized Neural Compression
Eirikur Agustsson, Lucas Theis

Self-Distillation Amplifies Regularization in Hilbert Space
Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

ShapeFlow: Learnable Deformation Flows Among 3D Shapes
Chiyu “Max” Jiang, Jingwei Huang, Andrea Tagliasacchi, Leonidas Guibas

Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form
Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi

High-Fidelity Generative Image Compression
Fabian Mentzer*, George Toderici, Michael Tschannen*, Eirikur Agustsson

COT-GAN: Generating Sequential Data via Causal Optimal Transport
Tianlin Xu, Li K. Wenliang, Michael Munn, Beatrice Acciaio

When Do Neural Networks Outperform Kernel Methods?
Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
Victor Veitch, Anisha Zaveri

Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
Sajad Norouzi, David J. Fleet, Mohamamd Norouzi

Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun
 
Consistent Plug-in Classifiers for Complex Objectives and Constraints
Shiv Kumar Tavker, Harish Guruprasad Ramaswamy, Harikrishna Narasimhan

Online MAP Inference of Determinantal Point Processes
Aditya Bhaskara, Amin Karbasi, Silvio Lattanzi, Morteza Zadimoghaddam

Organizing Recurrent Network Dynamics by Task-computation to Enable Continual Learning
Lea Duncker, Laura Driscoll, Krishna V. Shenoy, Maneesh Sahani, David Sussillo

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning
Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

Neural Execution Engines: Learning to Execute Subroutines
Yujun Yan*, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

Spin-Weighted Spherical CNNs
Carlos Esteves, Ameesh Makadia, Kostas Daniilidis

An Efficient Nonconvex Reformulation of Stagewise Convex Optimization Problems
Rudy R. Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy Dvijotham

Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar*, Cyril Zhang*

Regularizing Towards Permutation Invariance In Recurrent Models
Edo Cohen-Karlik, Avichai Ben David, Amir Globerson

Fast and Accurate kk-means++ via Rejection Sampling
Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler*, Ola Svensson

Fairness Without Demographics Through Adversarially Reweighted Learning
Preethi Lahoti*, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed Chi

Gradient Estimation with Stochastic Softmax Tricks
Max Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov

A Spectral Energy Distance for Parallel Speech Synthesis
Alexey A. Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner

Ode to an ODE
Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, Quoc Le

On Adaptive Attacks to Adversarial Example Defenses
Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

Fair Performance Metric Elicitation
Gaurush Hiranandani, Harikrishna Narasimhan, Oluwasanmi O. Koyejo

Robust Pre-Training by Adversarial Contrastive Learning
Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Why are Adaptive Methods Good for Attention Models?
Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar, Suvrit Sra

PyGlove: Symbolic Programming for Automated Machine Learning
Daiyi Peng, Xuanyi Dong, Esteban Real, Mingxing Tan, Yifeng Lu, Gabriel Bender, Hanxiao Liu, Adam Kraft, Chen Liang, Quoc Le

Fair Hierarchical Clustering
Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang

Fairness with Overlapping Groups; a Probabilistic Perspective
Forest Yang*, Moustapha Cisse, Sanmi Koyejo

Differentiable Top-k with Optimal Transport
Yujia Xie*, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Katherine Hermann, Ting Chen, Simon Kornblith

Approximate Heavily-Constrained Learning with Lagrange Multiplier Models
Harikrishna Narasimhan, Andrew Cotter, Yichen Zhou, Serena Wang, Wenshuo Guo

Evaluating Attribution for Graph Neural Networks
Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Peter Wang, Wesley Wei Qian, Kevin McCloskey, Lucy Colwell, Alexander Wiltschko

Sliding Window Algorithms for k-Clustering Problems
Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, Morteza Zadimoghaddam

Meta-Learning Requires Meta-Augmentation
Janarthanan Rajendran*, Alex Irpan, Eric Jang

What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola

Supervised Contrastive Learning
Prannay Khosla*, Piotr Teterwak*, Chen Wang*, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan

Critic Regularized Regression
Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh Merel, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Off-Policy Imitation Learning from Observations
Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Effective Diversity in Population Based Reinforcement Learning
Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Object-Centric Learning with Slot Attention
Francesco Locatello*, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

On the Power of Louvain in the Stochastic Block Model
Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic

Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
David Bieber, Charles Sutton, Hugo Larochelle, Daniel Tarlow

SMYRF – Efficient Attention using Asymmetric Clustering
Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis

Graph Contrastive Learning with Augmentations
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, Yang Shen

WOR and p’s: Sketches for ℓp-Sampling Without Replacement
Edith Cohen, Rasmus Pagh, David P. Woodruff

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, Ren Ng

Model Selection in Contextual Stochastic Bandit Problems
Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Adapting to Misspecification in Contextual Bandits
Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
Nino Vieillard, Tadashi Kozunoú, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Learning with Differentiable Pertubed Optimizers
Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Munchausen Reinforcement Learning
Nino Vieillard, Olivier Pietquin, Matthieu Geist

Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko

Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling
Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

Sample Complexity of Uniform Convergence for Multicalibration
Eliran Shabat, Lee Cohen, Yishay Mansour

Implicit Regularization and Convergence for Weight Normalization
Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

Most ReLU Networks Suffer from ℓ² Adversarial Perturbations
Amit Daniely, Hadas Shacham

Geometric Exploration for Online Control
Orestis Plevrakis, Elad Hazan

PLLay: Efficient Topological Layer Based on Persistent Landscapes
Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Jeremiah Zhe Liu*, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan

Bayesian Deep Ensembles via the Neural Tangent Kernel
Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

Hyperparameter Ensembles for Robustness and Uncertainty Quantification
Florian Wenzel, Jasper Snoek, Dustin Tran, Rodolphe Jenatton

Conic Descent and its Application to Memory-efficient Optimization Over Positive Semidefinite Matrices
John Duchi, Oliver Hinder, Andrew Naber, Yinyu Ye

On the Training Dynamics of Deep Networks with L₂ Regularization
Aitor Lewkowycz, Guy Gur-Ari

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Wei Hu*, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Adaptive Probing Policies for Shortest Path Routing
Aditya Bhaskara, Sreenivas Gollapudi, Kostas Kollias, Kamesh Munagala

Optimal Approximation — Smoothness Tradeoffs for Soft-Max Functions
Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Emmanouil Zampetakis

An Unsupervised Information-Theoretic Perceptual Quality Metric
Sangnie Bhardwaj, Ian Fischer, Johannes Ballé, Troy Chinen

Learning Graph Structure With A Finite-State Automaton Layer
Daniel Johnson, Hugo Larochelle, Daniel Tarlow

Estimating Training Data Influence by Tracing Gradient Descent
Garima Pruthi, Frederick Liu, Satyen Kale, Mukund Sundararajan


Tutorials

Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning
Organizers: Dustin Tran, Balaji Lakshminarayanan, Jasper Snoek

Abstraction & Reasoning in AI systems: Modern Perspectives
Organizers: Francois Chollet, Melanie Mitchell, Christian Szegedy

Policy Optimization in Reinforcement Learning
Organizers: Sham M Kakade, Martha White, Nicolas Le Roux

Federated Learning and Analytics: Industry Meets Academia
Organizers: Brendan McMahan, Virginia Smith, Peter Kairouz

Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization
Organizers: David Duvenaud, J. Zico Kolter, Matthew Johnson

Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems
Organizers: Praveen Chandar, Fernando Diaz, Brian St. Thomas


Workshops

Black in AI Workshop @ NeurIPS 2020 (Diamond Sponsor)
Mentorship Roundtables: Natasha Jacques

LatinX in AI Workshop @ NeurIPS 2020 (Platinum Sponsor)
Organizers include: Pablo Samuel Castro
Invited Speaker: Fernanda Viégas
Mentorship Roundtables: Tomas Izo

Queer in AI Workshop @ NeurIPS 2020 (Platinum Sponsor)
Organizers include: Raphael Gontijo Lopes

Women in Machine Learning (Platinum Sponsor)
Organizers include: Xinyi Chen, Jessica Schrouff
Invited Speaker: Fernanda Viégas
Sponsor Talk: Jessica Schrouff
Mentorship Roundtables: Hanie Sedghi, Marc Bellemare, Katherine Heller, Rianne van den Berg, Natalie Schluter, Colin Raffel, Azalia Mirhoseini, Emily Denton, Jesse Engel, Anusha Ramesh, Matt Johnson, Jeff Dean, Laurent Dinh, Samy Bengio, Yasaman Bahri, Corinna Cortes, Nicolas le Roux, Hugo Larochelle, Sergio Guadarrama, Natasha Jaques, Pablo Samuel Castro, Elaine Le, Cory Silvear

Muslims in ML
Organizers include: Mohammad Norouzi

Resistance AI Workshop
Organizers include: Elliot Creager, Raphael Gontijo Lopes

Privacy Preserving Machine Learning — PriML and PPML Joint Edition
Organizers include: Adria Gascon, Mariana Raykova

OPT2020: Optimization for Machine Learning
Organizers include: Courtney Paquette

Human in the Loop Dialogue Systems
Organizers include: Rahul Goel
Invited Speaker: Ankur Parikh

Self-Supervised Learning for Speech and Audio Processing
Organizers include: Tara Sainath
Invited Speaker: Bhuvana Ramabhadran

3rd Robot Learning Workshop
Organizers include: Alex Bewley, Vincent Vanhoucke
Invited Speaker: Pete Florence

Deep Reinforcement Learning
Organizers include: Chelsea Finn
Invited Speaker: Marc Bellemare

Machine Learning for Engineering Modeling, Simulation and Design
Organizers include: Stephan Hoyer

Machine Learning for Molecules
Organizers include: Jennifer Wei
Invited Speaker: Benjamin Sanchez-Lengeling

The Challenges of Real World Reinforcement Learning
Organizers include: Gabriel Dulac-Arnold
Invited Speaker: Chelsea Finn

Workshop on Computer Assisted Programming (CAP)
Organizers include: Charles Sutton, Augustus Odena

Self-Supervised Learning — Theory and Practice
Organizers include: Barret Zoph
Invited Speaker: Quoc V. Le

Deep Learning Through Information Geometry
Organizers include: Alexander Alemi

Expo

Drifting Efficiently Through the Stratosphere Using Deep Reinforcement Learning
Organizers include: Sal Candido

Accelerating Eye Movement Research via Smartphone Gaze
Organizers include: Vidhya Navalpakkam

Mining and Learning with Graphs at Scale
Organizers include: Bryan Perozzi, Vahab Mirrokni, Jonathan Halcrow, Jakub Lacki

*Work performed while at Google

Read More

Researchers can use qsim to explore quantum algorithms

Researchers can use qsim to explore quantum algorithms

A year ago, Google’s Quantum AI team achieved a beyond-classical computation by using a quantum computer to outperform the world’s fastest classical computer. With this, we entered a new era of quantum computing. We still have a long journey ahead of us to find practical applications, and we know we can’t get there alone. So today we’re launching qsim, a new open source quantum simulator that will help researchers develop quantum algorithms. 

The importance of simulators in quantum computing

Simulators are important tools for writing and debugging quantum code, and they’re essential for developing quantum algorithms. The few experimental quantum processors currently available, like the one that achieved a beyond-classical computation, are prone to noise and don’t perform error correction. This is where simulators like qsim come in. They allow researchers to explore quantum algorithms under idealized conditions and are more readily available. They also help prepare experiments to run on actual quantum hardware.

qsim can simulate around 30 qubits on a laptop, or up to 40 qubits in Google Cloud. What used to take an expensive cluster of computers to simulate can now be done on a single computer with qsim. We use qsim frequently at Google to test and benchmark quantum algorithms and processors. One example of this is our research in quantum neural networks. By using qsim with Cirq and TensorFlow Quantum, we’ve trained quantum ML models involving hundreds of thousands of circuits. 

Open source software tools for developing quantum algorithms

qsim is part of our open source ecosystem of software tools. These include Cirq, our quantum programming framework, ReCirq, a repository of research examples, and application-specific libraries such as OpenFermion for quantum chemistry and TensorFlow Quantum for quantum machine learning. These tools are designed to work together and to help you get started easily. Researchers who have developed quantum algorithms with Cirq can now use qsim by changing one line of code in Colab and experience an instant speedup in their circuit simulations.

Google Quantum AI website

To help you get started with qsim and our other open source quantum software, we’ve launched a new website that brings together all of our tools, research initiatives, and educational material. Researchers can access our latest publications and research repositories, students can find educational resources or apply for internships, and developers interested in quantum computing can join our growing community of contributors

Read More

When newsrooms collaborate with AI

When newsrooms collaborate with AI

Two years ago, the Google News Initiative partnered with the London School of Economics and Political Science to launch JournalismAI, a global effort to foster media literacy in newsrooms through research, training and experimentation.  

Since then, more than 62 thousand journalists have taken Introduction to Machine Learning, an online course provided in 17 languages in partnership with Belgian broadcaster VRT. More than 4,000 people have downloaded the JournalismAI report, which argued that “robots are not going to take over journalism” and that media organizations are keen to collaborate with one another and with technology companies. And over 20 media organizations including La Nación, Reuters, the South China Morning Post and The Washington Post have joined Collab, a global partnership to experiment with AI.

To mark this anniversary, together with the London School of Economics, we are hosting a week-long online event to bring together international academics, publishers and practitioners. From December 7 through December 11, the JournalismAI Festival will feature speakers and case studies from major global organizations including the Associated Press, the Wall Street Journal, The Guardian, Der Spiegel, Schibsted and Nikkei. 

This unique gathering will be an opportunity to hear the Collab teams present findings around key challenges such as using AI to understand, identify and mitigate newsroom biases, and increase audience loyalty.  

We’ll also present Pinpoint, Google’s tool to help reporters quickly research hundreds of thousands of documents by automatically identifying the most commonly mentioned people, places, and locations. 

20+ news organizations have been working collaboratively since June to solve common challenges with AI.

20+ news organizations have been working collaboratively since June to solve common challenges with AI.

To offer journalists a more hands-on approach to machine learning, JournalismAI is simultaneously launching a new training course with Ukrainian data journalism agency Texty. This resource, available on the GNI Training Center in 16 languages, will help journalists learn how to train an algorithm to identify similar patterns in satellite imagery using Google Cloud AutoML Vision. In 2018, Texty published Leprosy of the Land, an investigation in which they used machine learning techniques to detect cases of illegal amber mining across Ukraine.

In this investigation, Ukranian data journalism agency Texty used machine learning to detect cases of illegal amber mining.

In this investigation, Ukranian data journalism agency Texty used machine learning to detect cases of illegal amber mining.

In our training course, we’ll be helping reporters build a similar model that Texty used for their investigation. The dedicated GNI Live training sessions will take place over the week in multiple countries and in six languages.

You can join by signing up for the JournalismAI newsletter. You will receive updates and free access to the festival.

Read More