Meet the Googlers working to ensure tech is for everyone

Meet the Googlers working to ensure tech is for everyone

During their early studies and careers, Tiffany Deng, Tulsee Doshi and Timnit Gebru found themselves asking the same questions: Why is it that some products and services work better for some than others, and why isn’t everyone represented around the table when a decision is being made? Their collective passion to create a digital world that works for everyone is what brought the three women to Google, where they lead efforts to make machine learning systems fair and inclusive. 

I sat down with Tiffany, Tulsee and Timnit to discuss why working on machine learning fairness is so important, and how they came to work in this field.  

How would you explain your job to someone who isn’t in tech?

Tiffany: I’d say my job is to make sure we’re not reinforcing any of the entrenched and embedded biases humans might have into products people use, and that every time you pick up a product—a Google product—you as an individual can have a good experience when using it. 

Timnit: I help machines understand imagery and text. Just like a human, if a machine tries to learn a pattern or understand something, and it is trained on input that’s been provided for it to do just that, the input, or data in this case, has societal bias. This could lead to a biased outcome or prediction made by the machine. And my work is to figure out different ways of mitigating this bias. 

Tulsee: My work includes making sure everyone has positive experiences with our products, and that people don’t feel excluded or stereotyped, especially based on their identities. The products should work for you as an individual, and provide the best experience possible. 

What made you want to work in this field?

Tulsee:When I started college, I was unsure of what I wanted to study. I came in with an interest in math, and quickly found myself taking a variety of classes in computer science, among other topics. But no matter which interesting courses I took, I often felt a disconnect between what I was studying and the people the work would help. I kept coming back to wanting to focus on people, and after taking classes like child psychology and philosophy of AI, I decided I wanted to take on a role where I could combine my skill sets with a people-centered approach. I think everyone has an experience of services and technology not working for them, and solving for that is a passion behind much of what I do. 

Tiffany:After graduating from West Point I joined the army as an intelligence officer before becoming a consultant and working for the State Department and the Department of Defense. I then joined Facebook as a privacy manager for a period of time, and that’s when I started working on more ML fairness-related matters. When people ask me how I ended up where I am, I’d say that there’s never a straight path to finding your passion, and all the experiences that I’ve had outside of tech are ones I bring into the work I’m doing today. 

An important “aha moment” for me was about a year and a half ago, when my son had a rash all over his body and we went to the doctor to get help. They told us they weren’t able to diagnose him because his skin wasn’t red, and of course, his skin won’t turn red as he has deep brown skin. Someone telling me they can’t diagnose my son because of his skin—that’s troubling as a parent. I wanted to understand the root cause of the issue—why is this not working for me and my family, the way it does for others? Fast forwarding, when thinking about how AI will someday be ubiquitous and an important component in assisting human decision-making, I wanted to get involved and help ensure that we’re building technology that works equally as well for everyone. 

Timnit: I grew up with a father and two sisters working in electrical engineering, so I followed their path and decided to also pursue studies in the field. After spending some time at Apple working as a circuit designer and starting my own company, I went back to studying image processing and completed a Ph.D. in computer vision. Towards the end of my Ph.D., I read a ProPublica article discussing racial bias in predicting crime recidivism rates. At the same time, I started thinking more about how there were very few, if any, Black people in grad school and that whenever I went to conferences, Black people weren’t represented in the decisions driving this field of work. That’s how I came to found a nonprofit organization called Black in AI, along with Rediet Abebe, to increase the visibility of Black people working in the field. After graduating with my Ph.D. I did a postdoc at Microsoft research and soon after that, I took a role at Google as the co-lead of the ethical AI research team which was founded by Meg Mitchell

What are some of the main challenges in this work, and why is it so important? 

Tulsee:The challenge question is interesting, and a hard one. First of all, there is the theoretical and sociological question on the notion of fairness—how does one define what is fair? Addressing fairness concerns requires multiple perspectives, and product development approaches ranging from technical to design. Because of this, even for use cases where we have a lot of experience, there are still many challenges for product teams to understand the different approaches for measuring and tackling fairness concerns. This is one of the reasons why I believe tooling and resources are so critical, and why we’re investing in them for both internal and external purposes.

Another important aspect is company culture and how companies define their values and motivate their employees. We are starting to see a growing, industry-wide shift in terms of what success looks like. If organizations and product creators get rewarded for thinking about a broader set of people when developing products, the more companies start fostering a diverse workforce, consult external experts and think about whose voices are being represented at the table. We need to remember we’re talking about real people’s experiences, and while working on these issues can sometimes be emotionally difficult, it’s so important to get right. 

Timnit:A general challenge is that people who are the most negatively affected are often the ones whose voices are not heard. Representation is an important issue, and while there’s a lot of opportunities with ML technology in society, it’s important to have a diverse set of people and perspectives involved when working on the development so you don’t end up enhancing a gap between different groups.

This is not an issue that is specific to ML. As an example, let’s think of DNA sequencing. The African continent has the most diverse DNA in the world, but I read that it consists of less than 1 percent of the DNA studied in DNA sequencing, so there are examples of researchers who have come to the wrong conclusions based on data that was not representative. Now imagine someone is looking to develop the next generation of drugs, and the result could be that they don’t work for certain groups because their DNA hasn’t been rightly represented. 

Do you think ML has the potential to help complement human decision making, and drive the world to become more fair?

Timnit:It’s important to recognize the complexity of the human mind, and that humans should not be replaced when it comes to decision making. I don’t think ML can make the world more fair: Only humans can do that. And humans choose how to use this technology. In terms of opportunities, there are many ways in which we have already used ML systems to uncover societal bias, and this is something I work on as well. For example, studies by Jennifer Eberhardt and her collaborators at Stanford University including Vinodkumar Prabhakaran, who has since joined our team, used natural language processing to analyze body camera recordings of police stops in Oakland. They found a pattern of police speaking less respectfully to Black people than white people. A lot of times when you show these issues backed up by data and scientific analysis, it can help make a case. At the same time, the history of scientific racism also shows that data can be used to propagate the most harmful societal biases of the day. Blindly trusting data driven studies or decisions can be dangerous. It’s important to understand the context under which these studies are conducted and to work with affected communities and other domain experts to formulate the questions that need to be addressed.

Tiffany:I think ML will be incredibly important to help with things like climate change, sustainability and helping save endangered animals. Timnit’s work on using AI to help identify diseased cassava plants is an incredible use of AI, especially in the developing world. The range of problems AI can aid humans with is endless—we just have to ensure we continue to build technological solutions with ethics and inclusion at the forefront of our conversations.

Read More

Agile and Intelligent Locomotion via Deep Reinforcement Learning

Agile and Intelligent Locomotion via Deep Reinforcement Learning

Posted by Yuxiang Yang and Deepali Jain, AI Residents, Robotics at Google

Recent advancements in deep reinforcement learning (deep RL) has enabled legged robots to learn many agile skills through automated environment interactions. In the past few years, researchers have greatly improved sample efficiency by using off-policy data, imitating animal behaviors, or performing meta learning. However, sample efficiency remains a bottleneck for most deep reinforcement learning algorithms, especially in the legged locomotion domain. Moreover, most existing works focus on simple, low-level skills only, such as walking forward, backward and turning. In order to operate autonomously in the real world, robots still need to combine these skills to generate more advanced behaviors.

Today we present two projects that aim to address the above problems and help close the perception-actuation loop for legged robots. In “Data Efficient Reinforcement Learning for Legged Robots”, we present an efficient way to learn low level motion control policies. By fitting a dynamics model to the robot and planning for actions in real time, the robot learns multiple locomotion skills using less than 5 minutes of data. Going beyond simple behaviors, we explore automatic path navigation in “Hierarchical Reinforcement Learning for Quadruped Locomotion”. With a policy architecture designed for end-to-end training, the robot learns to combine a high-level planning policy with a low-level motion controller, in order to navigate autonomously through a curved path.

Data Efficient Reinforcement Learning for Legged Robots
A major roadblock in RL is the lack of sample efficiency. Even with a state-of-the-art sample-efficient learning algorithm like Soft Actor-Critic (SAC), it would still require more than an hour of data to learn a reasonable walking policy, which is difficult to collect in the real world.

In a continued effort to learn walking skills using minimal interaction with the real-world environment, we present another, more sample-efficient model-based method for learning basic walking skills that dramatically reduces the training data needed. Instead of directly learning a policy that maps from environment state to robot action, we learn a dynamics model of the robot that estimates future states given its current state and action. Since the entire learning process requires less than 5 minutes of data, it could be performed directly on the real robot.

We start by executing random actions on the robot, and fit the model to the data collected. With the model fitted, we control the robot using a model predictive control (MPC) planner. We iterate between collecting more data with MPC and re-training the model to better fit the dynamics of the environment.

Overview of the model-based learning pipeline. The system alternates between fitting the dynamics model and collecting trajectories using model predictive control (MPC).

In standard MPC, the controller plans for a sequence of actions at each timestep, and only executes the first of the planned actions. While online replanning with regular feedback from the robot to the controller makes the controller robust to model inaccuracies, it also poses a challenge for the action planner, as planning must finish before the next step of the control loop (usually less than 10ms for legged robots). To satisfy such a tight time constraint, we introduce a multi-threaded, asynchronous version of MPC, with action planning and execution happening on different threads. As the execution thread applies actions at a high frequency, the planning thread optimizes for actions in the background without interruption. Furthermore, since action planning can take multiple timesteps, the robot state would have changed by the time planning has finished. To address the problem with planning latency, we devise a novel technique to compensate, which first predicts the future state when the planner is expected to finish its computation, and then uses this future state to seed the planning algorithm.

We separate action planning and execution on different threads.

Although MPC refreshes the action plan frequently, the planner still needs to work over long action horizons to keep track of the long-term goal and avoid myopic behaviors. To that end, we use a multi-step loss function, a reformulation of the model loss function that helps to reduce error accumulation over time by predicting the loss over a range of future steps.

Safety is another concern for learning on the real robot. For legged robots, a small mistake, such as missing a foot step, could lead to catastrophic failures, from the robot falling to the motor overheating. To ensure safe exploration, we embed a stable, in-place stepping gait prior, that is modulated by a trajectory generator. With the stable walking prior, MPC can then safely explore the action space.

Combining an accurate dynamics model with an online, asynchronous MPC controller, the robot successfully learned to walk using only 4.5 minutes of data (36 episodes). The learned dynamics model is also generalizable: by simply changing the reward function of MPC, the controller is able to optimize for different behaviors, such as walking backwards, or turning, without re-training. As an extension, we use a similar framework to enable even more agile behaviors. For example, in simulation the robot learns to backflip and walk on its rear legs, though these behaviors are yet to be learned by the real robot.

The robot learns to walk using only 4.5 minutes of data.
The robot learns to backflip and walk with rear legs using the same framework.

Combining low-level controller with high-level planning
Although model-based RL has allowed the robot to learn simple locomotion skills efficiently, such skills are insufficient for handling complex, real-world tasks. For example, in order to navigate through an office space, the robot may have to adjust its speed, direction and height multiple times, instead of following a pre-defined speed profile. Traditionally, people solve such complex tasks by breaking them down into multiple hierarchical sub-problems, such as a high-level trajectory planner and a low-level trajectory-following controller. However, manually defining a suitable hierarchy is typically a tedious task, as it requires careful engineering for each sub-problem.

In our second paper, we introduce a hierarchical reinforcement learning (HRL) framework that can be trained to automatically decompose complex reinforcement learning tasks. We break down our policy structure into a high-level and a low-level policy. Instead of designing each policy manually, we only define a simple communication protocol between the policy levels. In this framework, the high-level policy (e.g., a trajectory planner) commands the low-level policy (such as the motion control policy) through a latent command, and decides for how long to hold that command constant before issuing a new one. The low-level policy then interprets the latent command from the high-level policy, and gives motor commands to the robot.

To facilitate learning, we also split the observation space into high-level (e.g., robot position and orientation) and low-level (IMU, motor positions) observations, which are fed to their corresponding policies. This architecture naturally allows the high-level policy to operate at a slower timescale than the low-level policy, which saves computation resources and reduces training complexity.

Framework of Hierarchical Policy: The policy gets observations from the robot and sends motor commands to execute desired actions. It is split into two levels (high and low). The high-level policy gives a latent command to the low-level policy and also decides the duration for which low-level will run.

Since the high-level and low-level policies operate at discrete timescales, the entire policy structure is not end-to-end differentiable, and standard gradient-based RL algorithms like PPO and SAC cannot be used. Instead, we choose to train the hierarchical policy through augmented random search (ARS), a simple evolutionary optimization method that has demonstrated good performance in reinforcement learning tasks. Weights of both levels of the policy are trained together, where the objective is to maximize the total reward from the robot trajectory.

We test our framework on a path-following task using the same quadruped robot. In addition to straight walking, the robot needs to steer in different directions to complete the task. Note that as the low-level policy does not know the robot’s position in the path, it does not have sufficient information to complete the entire task on its own. However, with the coordination between the high-level and low-level policies, steering behavior emerges automatically in the latent command space, which allows the robot to efficiently complete the path. After successful training in a simulated environment, we validate our results on hardware by transferring an HRL policy to a real robot and recording the resulting trajectories.

Successful trajectory of a robot on a curved path. Left: A plot of the trajectory traversed by the robot with dots along the trajectory marking the positions where the high-level policy sent a new latent command to the low-level policy. Middle: The robot walking along the path in the simulated environment. Right: The robot walking around the path in the real world.

To further demonstrate the learned hierarchical policy, we visualized the behavior of the learned low-level policy under different latent commands. As shown in the plot below, different latent commands can cause the robot to walk straight, or turn left or right at different rates. We also test the generalizability of low-level policies by transferring them to new tasks from a similar domain, which, in our case, includes following a path with different shapes. By fixing the low-level policy weights and only training the high-level policy, the robot could successfully traverse through different paths.

Left: Visualization of a learned 2D latent command space. Vector directions correspond to the movement direction of the robot. Vector length is proportional to the distance covered. Right: Transfer of low level policy: An HRL policy was trained on a single path (right, top). The learned low-level policy was then reused when training the high-level policy on other paths (e.g., right, bottom).

Conclusion
Reinforcement learning poses a promising future for robotics by automating the controller design process. With model-based RL, we enabled efficient learning of generalizable locomotion behaviors directly on the real robot. With hierarchical RL, the robot learned to coordinate policies at different levels to achieve more complex tasks. In the future, we plan to bring perception into the loop, so that robots can operate truly autonomously in the real world.

Acknowledgements
Both Deepali Jain and Yuxiang Yang are residents in the AI Residency program, mentored by Ken Caluwaerts and Atil Iscen. We would also like to thank Jie Tan and Vikas Sindhwani for support of the research, and Noah Broestl for managing the New York AI Residency Program.

Understanding the Shape of Large-Scale Data

Understanding the Shape of Large-Scale Data

Posted by Anton Tsitsulin, Research Intern and Bryan Perozzi, Senior Research Scientist, Graph Mining Team, Google Research

Understanding the differences and similarities between complex datasets is an interesting challenge that often arises when working with data. One way to formalize this question is to view each dataset as a graph, a mathematical model for how items relate to each other. Graphs are widely used to model relationships between objects — the Internet graph connects pages referencing each other, social graphs link together friends, and molecule graphs connect atoms bonding with each other.

Graphs are discrete objects that can model the relationships between many different types of data, including web pages (left), social connections (center), or molecules (right).

Once there is a collection of multiple graphs, it is common to want to predict some property of each one as an aggregate (i.e., one label per graph). For example, consider the task of predicting protein function from structure: each dataset here is one protein, and the prediction task is whether the final structure encodes an enzyme or not. Since one wants a model to actually compute the prediction, we need a representation that lets us generalize across different protein structures. Ideally, one would want a way to represent graphs as vectors without costly labelling. The problem becomes harder with increasing graph size — in the molecule case humans possess some knowledge about their properties, however, reasoning about larger, more complex datasets becomes increasingly difficult.

In this post we highlight some recent advances in the area of graph representation learning with “Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs” (published at WWW’20), a publication that improves on the scalability of our earlier research, “DDGK: Learning Graph Representations for Deep Divergence Graph Kernels” (published at WWW’19). SLaQ introduces a way to scale computations to approximate a certain class of graph statistics, allowing one to quickly and efficiently characterize large graphs. We are also happy to announce that we have released the code for both papers in the Google Research GitHub repository for graph embeddings.

Fully Unsupervised Learning of Graph Similarity
In our 2019 paper, we showed that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision. We propose deep divergence graph kernels (DDGK), an unsupervised method for learning representations over graphs that encode mappings of similarities between them. Unlike previous work, our unsupervised method jointly learns node representations, graph representations, and an attention-based alignment between graphs.

Here is a t-SNE visualization of the latent representations learned by DDGK to compare proteins. Blue points indicate proteins that encode enzymes and the red points are for those that do not. We can see that the encoding correlates with a structural property of the protein (whether or not it encodes enzymes), even though this context was not provided during training. (Note that this is a projection of the representations, and so the absolute axis values aren’t meaningful.)

In the example above, we demonstrate how these representations can automatically learn to represent graphs and align them in a way that encodes their latent functional similarity. Experiments on other datasets show we can capture similarities and differences across graphs of different types (language, biology, and social interactions).

The pairwise distance between different datasets encoded and aligned using DDGK. Color indicates distance in the latent space, and the scale of similarity ranges from 0 (identical) to 1.0 (very different). We see that the representations can be clustered to group similar datasets together — for example, the datasets nci1 and ptc are both datasets of chemical compounds.

Fast and Accurate Approximation of Spectral Descriptors
A graph’s spectrum is a powerful representation that encodes its properties, including connectivity patterns between graph nodes and clustering information. The spectrum has been shown to convey rich information about the properties of different objects such as the sound of a drum, 3D shapes, graphs, and general high-dimensional data. Applications of spectral graph descriptors include AutoML systems, anomaly detection in dynamic graphs, and chemical molecule characterization.

Currently, learning-based systems such as DDGK do not scale to either large graphs or large graph collections. Alternatively, one can use the spectral information without the learning component to attain more desirable scaling properties. However, computing spectral descriptors for large graphs is computationally prohibitive. Our more recent paper addresses this problem by proposing SLaQ, a method for approximating a family of graph descriptors. Our approach uses a randomized approximation algorithm for computing traces of spectrum functions that allows us to study several well-known spectral graph characteristics like Von Neumann Graph Entropy, Estrada Index, graph energy, and NetLSD.

For example, we use SLaQ to monitor anomalous changes in the Wikipedia graph structure. SLaQ allows us to discern meaningful changes in the structure of the page graph from trivial ones such as mass page renames. Our experiments show two orders of magnitude improvement in approximation accuracy, on average.

Left: The well-known Karate graph represents the social interactions of two martial arts clubs. Right: The spectral descriptors (NetLSD, VNGE, and Estrada Index) computed for the original graph in blue and the version with removed edges in red.

Conclusions
Unsupervised representation learning for graphs is an important problem, and we believe that the methods we highlight here are exciting steps forward in this area! Specifically, SLaQ allows us to compute principled representations for vast datasets, and DDGK introduces a mechanism for automatically learning alignments between datasets. We hope that our contributions will help advance the analysis of large datasets, and will be useful for understanding changes to time-varying graph datasets, like those used in recommendation systems.

Acknowledgements
We thank Marina Munkhoeva, Rami Al-Rfou, and Dustin Zelle who contributed to these works. For more information on the Graph Mining team (part of Algorithm and Optimization Group) visit our pages.