We’ve contributed to a multi-stakeholder report by 58 co-authors at 30 organizations, including the Centre for the Future of Intelligence, Mila, Schwartz Reisman Institute for Technology and Society, Center for Advanced Study in the Behavioral Sciences, and Center for Security and Emerging Technologies. This report describes 10 mechanisms to improve the verifiability of claims made about AI systems. Developers can use these tools to provide evidence that AI systems are safe, secure, fair, or privacy-preserving. Users, policymakers, and civil society can use these tools to evaluate AI development processes.
While a growing number of organizations have articulated ethics principles to guide their AI development process, it can be difficult for those outside of an organization to verify whether the organization’s AI systems reflect those principles in practice. This ambiguity makes it harder for stakeholders such as users, policymakers, and civil society to scrutinize AI developers’ claims about properties of AI systems and could fuel competitive corner-cutting, increasing social risks and harms. The report describes existing and potential mechanisms that can help stakeholders grapple with questions like:
- Can I (as a user) verify the claims made about the level of privacy protection guaranteed by a new AI system I’d like to use for machine translation of sensitive documents?
- Can I (as a regulator) trace the steps that led to an accident caused by an autonomous vehicle? Against what standards should an autonomous vehicle company’s safety claims be compared?
- Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?
- Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?
The 10 mechanisms highlighted in the report are listed below, along with recommendations aimed at advancing each one. (See the report for discussion of how these mechanisms support verifiable claims as well as relevant caveats about our findings.)
Institutional Mechanisms and Recommendations
- Third party auditing. A coalition of stakeholders should create a task force to research options for conducting and funding third party auditing of AI systems.
- Red teaming exercises. Organizations developing AI should run red teaming exercises to explore risks associated with systems they develop, and should share best practices and tools.
- Bias and safety bounties. AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.
- Sharing of AI incidents. AI developers should share more information about AI incidents, including through collaborative channels.
Software Mechanisms and Recommendations
- Audit trails. Standard setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems.
- Interpretability. Organizations developing AI and funding bodies should support research into the interpretability of AI systems, with a focus on supporting risk assessment and auditing.
- Privacy-preserving machine learning. AI developers should develop, share, and use suites of tools for privacy-preserving machine learning that include measures of performance against common standards.
Hardware Mechanisms and Recommendations
- Secure hardware for machine learning. Industry and academia should work together to develop hardware security features for AI accelerators or otherwise establish best practices for the use of secure hardware (including secure enclaves on commodity hardware) in machine learning contexts.
- High-precision compute measurement. One or more AI labs should estimate the computing power involved in a single project in great detail and report on lessons learned regarding the potential for wider adoption of such methods.
- Compute support for academia. Government funding bodies should substantially increase funding for computing power resources for researchers in academia, in order to improve the ability of those researchers to verify claims made by industry.
We and our co-authors will be doing further research on these mechanisms and OpenAI will be looking to adopt several of these mechanisms in the future. We hope that this report inspires meaningful dialogue, and we are eager to discuss additional institutional, software, and hardware mechanisms that could be useful in enabling trustworthy AI development. We encourage anyone interested in collaborating on these issues to connect with the corresponding authors and visit the report website.
- Miles Brundage OpenAI
- Shahar Avin Centre for the Study of Existential Risk, Leverhulme Centre for the Future of Intelligence
- Jasmine Wang Mila, University of Montreal
- Haydn Belfield Centre for the Study of Existential Risk, Leverhulme Centre for the Future of Intelligence
- Gretchen Krueger OpenAI
- Gillian Hadfield OpenAI, University of Toronto, Schwartz Reisman Institute for Technology and Society
- Heidy Khlaaf Adelard
- Jingying Yang Partnership on AI
- Helen Toner Center for Security and Emerging Technology
- Ruth Fong University of Oxford
- Tegan Maharaj Mila, Montreal Polytechnic
- Pang Wei Koh Stanford University
- Sara Hooker Google Brain
- Jade Leung Future of Humanity Institute
- Andrew Trask University of Oxford
- Emma Bluemke University of Oxford
- Jonathan Lebensold Mila, McGill University
- Cullen O’Keefe OpenAI
- Mark Koren Stanford Centre for AI Safety
- Théo Ryffel École Normale Supérieure (Paris)
- JB Rubinovitz Remedy.AI
- Tamay Besiroglu University of Cambridge
- Federica Carugati Center for Advanced Study in the Behavioral Sciences
- Jack Clark OpenAI
- Peter Eckersley Partnership on AI
- Sarah de Haas Google Research
- Maritza Johnson Google Research
- Ben Laurie Google Research
- Alex Ingerman Google Research
- Igor Krawczuk École Polytechnique Fédérale de Lausanne
- Amanda Askell OpenAI
- Rosario Cammarota Intel
- Andrew Lohn RAND Corporation
- David Krueger Mila, Montreal Polytechnic
- Charlotte Stix Eindhoven University of Technology
- Peter Henderson Stanford University
- Logan Graham University of Oxford
- Carina Prunkl Future of Humanity Institute
- Bianca Martin OpenAI
- Elizabeth Seger University of Cambridge
- Noa Zilberman University of Oxford
- Seán Ó hÉigeartaigh Leverhulme Centre for the Future of Intelligence, Centre for the Study of Existential Risk
- Frens Kroeger Coventry University
- Girish Sastry OpenAI
- Rebecca Kagan Center for Security and Emerging Technology
- Adrian Weller University of Cambridge, Alan Turing Institute
- Brian Tse Future of Humanity Institute, Partnership on AI
- Elizabeth Barnes OpenAI
- Allan Dafoe Future of Humanity Institute
- Paul Scharre Center for a New American Security
- Ariel Herbert-Voss OpenAI
- Martijn Rasser Center for a New American Security
- Shagun Sodhani Mila, University of Montreal
- Carrick Flynn Center for Security and Emerging Technology
- Thomas Gilbert University of California, Berkeley
- Lisa Dyer Partnership on AI
- Saif Khan Center for Security and Emerging Technology
- Yoshua Bengio Mila, University of Montreal
- Markus Anderljung Future of Humanity Institute
OpenAI