As humans, we constantly learn from the world around us. We experience inputs that shape our knowledge — including the boundaries of both what we know and what we don’t know.
Many of today’s machines also learn by example. However, these machines are typically trained on datasets and information that doesn’t always include rare or out-of-the-ordinary examples that inevitably come up in real-life scenarios. What is an algorithm to do when faced with the unknown?
I recently spoke with Abhijit Guha Roy, an engineer on the Health AI team, and Ian Kivlichan, an engineer on the Jigsaw team, to hear more about using AI in real-world scenarios and better understand the importance of training it to know when it doesn’t know.
Abhijit, tell me about your recent research in the dermatology space.
We’re applying deep learning to a number of areas in health, including in medical imaging where it can be used to aid in the identification of health conditions and diseases that might require treatment. In the dermatological field, we have shown that AI can be used to help identify possible skin issues and are in the process of advancing research and products, including DermAssist, that can support both clinicians and people like you and me.
In these real-world settings, the algorithm might come up against something it’s never seen before. Rare conditions, while individually infrequent, might not be so rare in aggregate. These so-called “out-of-distribution” examples are a common problem for AI systems which can perform less well when it’s exposed to things they haven’t seen before in its training.
Can you explain what “out-distribution” means for AI?
Most traditional machine learning examples that are used to train AI deal with fairly unsubtle — or obvious — changes. For example, if an algorithm that is trained to identify cats and dogs comes across a car, then it can typically detect that the car — which is an “out-of-distribution” example — is an outlier. Building an AI system that can recognize the presence of something it hasn’t seen before in training is called “out-of-distribution detection,” and is an active and promising field of AI research.
Okay, let’s go back to how this applies to AI in medical settings.
Going back to our research in the dermatology space, the differences between skin conditions can be much more subtle than recognizing a car from a dog or a cat, even more subtle than recognizing a previously unseen “pick-up truck” from a “truck”. As such, the out-of-distribution detection task in medical AI demands even more of our focused attention.
This is where our latest research comes in. We trained our algorithm to recognize even the most subtle of outliers (a so-called “near-out of distribution” detection task). Then, instead of the model inaccurately guessing, it can take a safer course of action — like deferring to human experts.
Ian, you’re working on another area where AI needs to know when it doesn’t know something. What’s that?
The field of content moderation. Our team at Jigsaw used AI to build a free tool called Perspective that scores comments according to how likely they are to be considered toxic by readers. Our AI algorithms help identify toxic language and online harassment at scale so that human content moderators can make better decisions for their online communities. A range of online platforms use Perspective more than 600 million times a day to reduce toxicity and the human time required to moderate content.
In the real world, online conversations — both the things people say and even the ways they say them — are continually changing. For example, two years ago, nobody would have understood the phrase “non-fungible token (NFT).” Our language is always evolving, which means a tool like Perspective doesn’t just need to identify potentially toxic or harassing comments, it also needs to “know when it doesn’t know,” and then defer to human moderators when it encounters comments very different from anything it has encountered before.
In our recent research, we trained Perspective to identify comments it was uncertain about and flag them for separate human review. By prioritizing these comments, human moderators can correct more than 80% of the mistakes the AI might otherwise have made.
What connects these two examples?
We have more in common with the dermatology problem than you’d expect at first glance — even though the problems we try to solve are so different.
Building AI that knows when it doesn’t know something means you can prevent certain errors that might have unintended consequences. In both cases, the safest course of action for the algorithm entails deferring to human experts rather than trying to make a decision that could lead to potentially negative effects downstream.
There are some fields where this isn’t as important and others where it’s critical. You might not care if an automated vegetable sorter incorrectly sorts a purple carrot after being trained on orange carrots, but you would definitely care if an algorithm didn’t know what to do about an abnormal shadow on an X-ray that a doctor might recognize as an unexpected cancer.
How is AI uncertainty related to AI safety?
Most of us are familiar with safety protocols in the workplace. In safety-critical industries like aviation or medicine, protocols like “safety checklists” are routine and very important in order to prevent harm to both the workers and the people they serve.
It’s important that we also think about safety protocols when it comes to machines and algorithms, especially when they are integrated into our daily workflow and aid in decision-making or triaging that can have a downstream impact.
Teaching algorithms to refrain from guessing in unfamiliar scenarios and to ask for help from human experts falls within these protocols, and is one of the ways we can reduce harm and build trust in our systems. This is something Google is committed to, as outlined in its AI Principles.