Planning for AGI and beyond

Planning for AGI and beyond

Planning for AGI and beyond

Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.

If AGI is successfully created, this technology could help us elevate humanity by increasing abundance, turbocharging the global economy, and aiding in the discovery of new scientific knowledge that changes the limits of possibility.

AGI has the potential to give everyone incredible new capabilities; we can imagine a world where all of us have access to help with almost any cognitive task, providing a great force multiplier for human ingenuity and creativity.

On the other hand, AGI would also come with serious risk of misuse, drastic accidents, and societal disruption. Because the upside of AGI is so great, we do not believe it is possible or desirable for society to stop its development forever; instead, society and the developers of AGI have to figure out how to get it right.[1]

Although we cannot predict exactly what will happen, and of course our current progress could hit a wall, we can articulate the principles we care about most:

  1. We want AGI to empower humanity to maximally flourish in the universe. We don’t expect the future to be an unqualified utopia, but we want to maximize the good and minimize the bad, and for AGI to be an amplifier of humanity.
  2. We want the benefits of, access to, and governance of AGI to be widely and fairly shared.
  3. We want to successfully navigate massive risks. In confronting these risks, we acknowledge that what seems right in theory often plays out more strangely than expected in practice. We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios.

The short term

There are several things we think are important to do now to prepare for AGI.

First, as we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.

A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.

We currently believe the best way to successfully navigate AI deployment challenges is with a tight feedback loop of rapid learning and careful iteration. Society will face major questions about what AI systems are allowed to do, how to combat bias, how to deal with job displacement, and more. The optimal decisions will depend on the path the technology takes, and like any new field, most expert predictions have been wrong so far. This makes planning in a vacuum very difficult.[2]

Generally speaking, we think more usage of AI in the world will lead to good, and want to promote it (by putting models in our API, open-sourcing them, etc.). We believe that democratized access will also lead to more and better research, decentralized power, more benefits, and a broader set of people contributing new ideas.

As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models. Our decisions will require much more caution than society usually applies to new technologies, and more caution than many users would like. Some people in the AI field think the risks of AGI (and successor systems) are fictitious; we would be delighted if they turn out to be right, but we are going to operate as if these risks are existential.


As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models.

At some point, the balance between the upsides and downsides of deployments (such as empowering malicious actors, creating social and economic disruptions, and accelerating an unsafe race) could shift, in which case we would significantly change our plans around continuous deployment.

Second, we are working towards creating increasingly aligned and steerable models. Our shift from models like the first version of GPT-3 to InstructGPT and ChatGPT is an early example of this.

In particular, we think it’s important that society agree on extremely wide bounds of how AI can be used, but that within those bounds, individual users have a lot of discretion. Our eventual hope is that the institutions of the world agree on what these wide bounds should be; in the shorter term we plan to run experiments for external input. The institutions of the world will need to be strengthened with additional capabilities and experience to be prepared for complex decisions about AGI.

The “default setting” of our products will likely be quite constrained, but we plan to make it easy for users to change the behavior of the AI they’re using. We believe in empowering individuals to make their own decisions and the inherent power of diversity of ideas.

We will need to develop new alignment techniques as our models become more powerful (and tests to understand when our current techniques are failing). Our plan in the shorter term is to use AI to help humans evaluate the outputs of more complex models and monitor complex systems, and in the longer term to use AI to help us come up with new ideas for better alignment techniques.

Importantly, we think we often have to make progress on AI safety and capabilities together. It’s a false dichotomy to talk about them separately; they are correlated in many ways. Our best safety work has come from working with our most capable models. That said, it’s important that the ratio of safety progress to capability progress increases.

Third, we hope for a global conversation about three key questions: how to govern these systems, how to fairly distribute the benefits they generate, and how to fairly share access.

In addition to these three areas, we have attempted to set up our structure in a way that aligns our incentives with a good outcome. We have a clause in our Charter about assisting other organizations to advance safety instead of racing with them in late-stage AGI development. We have a cap on the returns our shareholders can earn so that we aren’t incentivized to attempt to capture value without bound and risk deploying something potentially catastrophically dangerous (and of course as a way to share the benefits with society). We have a nonprofit that governs us and lets us operate for the good of humanity (and can override any for-profit interests), including letting us do things like cancel our equity obligations to shareholders if needed for safety and sponsor the world’s most comprehensive UBI experiment.


We have attempted to set up our structure in a way that aligns our incentives with a good outcome.

We think it’s important that efforts like ours submit to independent audits before releasing new systems; we will talk about this in more detail later this year. At some point, it may be important to get independent review before starting to train future systems, and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models. We think public standards about when an AGI effort should stop a training run, decide a model is safe to release, or pull a model from production use are important. Finally, we think it’s important that major world governments have insight about training runs above a certain scale.

The long term

We believe that future of humanity should be determined by humanity, and that it’s important to share information about progress with the public. There should be great scrutiny of all efforts attempting to build AGI and public consultation for major decisions.

The first AGI will be just a point along the continuum of intelligence. We think it’s likely that progress will continue from there, possibly sustaining the rate of progress we’ve seen over the past decade for a long period of time. If this is true, the world could become extremely different from how it is today, and the risks could be extraordinary. A misaligned superintelligent AGI could cause grievous harm to the world; an autocratic regime with a decisive superintelligence lead could do that too.

AI that can accelerate science is a special case worth thinking about, and perhaps more impactful than everything else. It’s possible that AGI capable enough to accelerate its own progress could cause major changes to happen surprisingly quickly (and even if the transition starts slowly, we expect it to happen pretty quickly in the final stages). We think a slower takeoff is easier to make safe, and coordination among AGI efforts to slow down at critical junctures will likely be important (even in a world where we don’t need to do this to solve technical alignment problems, slowing down may be important to give society enough time to adapt).

Successfully transitioning to a world with superintelligence is perhaps the most important—and hopeful, and scary—project in human history. Success is far from guaranteed, and the stakes (boundless downside and boundless upside) will hopefully unite all of us.

We can imagine a world in which humanity flourishes to a degree that is probably impossible for any of us to fully visualize yet. We hope to contribute to the world an AGI aligned with such flourishing.


Acknowledgments

Thanks to Brian Chesky, Paul Christiano, Jack Clark, Holden Karnofsky, Tasha McCauley, Nate Soares, Kevin Scott, Brad Smith, Helen Toner, Allan Dafoe, and the OpenAI team for reviewing drafts of this.


Footnotes

  1. We seem to have been given lots of gifts relative to what we expected earlier: for example, it seems like creating AGI will require huge amounts of compute and thus the world will know who is working on it, it seems like the original conception of hyper-evolved RL agents competing with each other and evolving intelligence in a way we can’t really observe is less likely than it originally seemed, almost no one predicted we’d make this much progress on pre-trained language models that can learn from the collective preferences and output of humanity, etc.

    AGI could happen soon or far in the future; the takeoff speed from the initial AGI to more powerful successor systems could be slow or fast. Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt. ↩︎

  2. For example, when we first started OpenAI, we didn’t expect scaling to be as important as it has turned out to be. When we realized it was going to be critical, we also realized our original structure wasn’t going to work—we simply wouldn’t be able to raise enough money to accomplish our mission as a nonprofit—and so we came up with a new structure.

    As another example, we now believe we were wrong in our original thinking about openness, and have pivoted from thinking we should release everything (though we open source some things, and expect to open source more exciting things in the future!) to thinking that we should figure out how to safely share access to and benefits of the systems. We still believe the benefits of society understanding what is happening are huge and that enabling such understanding is the best way to make sure that what gets built is what society collectively wants (obviously there’s a lot of nuance and conflict here). ↩︎

OpenAI

How should AI systems behave, and who should decide?

How should AI systems behave, and who should decide?

How should AI systems behave, and who should decide?

OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. We therefore think a lot about the behavior of AI systems we build in the run-up to AGI, and the way in which that behavior is determined.

Since our launch of ChatGPT, users have shared outputs that they consider politically biased, offensive, or otherwise objectionable. In many cases, we think that the concerns raised have been valid and have uncovered real limitations of our systems which we want to address. We’ve also seen a few misconceptions about how our systems and policies work together to shape the outputs you get from ChatGPT.

Below, we summarize:

  • How ChatGPT’s behavior is shaped;
  • How we plan to improve ChatGPT’s default behavior;
  • Our intent to allow more system customization; and
  • Our efforts to get more public input on our decision-making.

Where we are today

Unlike ordinary software, our models are massive neural networks. Their behaviors are learned from a broad range of data, not programmed explicitly. Though not a perfect analogy, the process is more similar to training a dog than to ordinary programming. An initial “pre-training” phase comes first, in which the model learns to predict the next word in a sentence, informed by its exposure to lots of Internet text (and to a vast array of perspectives). This is followed by a second phase in which we “fine-tune” our models to narrow down system behavior.

As of today, this process is imperfect. Sometimes the fine-tuning process falls short of our intent (producing a safe and useful tool) and the user’s intent (getting a helpful output in response to a given input). Improving our methods for aligning AI systems with human values is a top priority for our company, particularly as AI systems become more capable.

A two step process: Pre-training and fine-tuning

The two main steps involved in building ChatGPT work as follows:

How should AI systems behave, and who should decide?

  • First, we “pre-train” models by having them predict what comes next in a big dataset that contains parts of the Internet. They might learn to complete the sentence “instead of turning left, she turned ___.” By learning from billions of sentences, our models learn grammar, many facts about the world, and some reasoning abilities. They also learn some of the biases present in those billions of sentences.
  • Then, we “fine-tune” these models on a more narrow dataset that we carefully generate with human reviewers who follow guidelines that we provide them. Since we cannot predict all the possible inputs that future users may put into our system, we do not write detailed instructions for every input that ChatGPT will encounter. Instead, we outline a few categories in the guidelines that our reviewers use to review and rate possible model outputs for a range of example inputs. Then, while they are in use, the models generalize from this reviewer feedback in order to respond to a wide array of specific inputs provided by a given user.

The role of reviewers and OpenAI’s policies in system development

In some cases, we may give guidance to our reviewers on a certain kind of output (for example, “do not complete requests for illegal content”). In other cases, the guidance we share with reviewers is more high-level (for example, “avoid taking a position on controversial topics”). Importantly, our collaboration with reviewers is not one-and-done—it’s an ongoing relationship, in which we learn a lot from their expertise.

A large part of the fine-tuning process is maintaining a strong feedback loop with our reviewers, which involves weekly meetings to address questions they may have, or provide clarifications on our guidance. This iterative feedback process is how we train the model to be better and better over time.

Addressing biases

Many are rightly worried about biases in the design and impact of AI systems. We are committed to robustly addressing this issue and being transparent about both our intentions and our progress. Towards that end, we are sharing a portion of our guidelines that pertain to political and controversial topics. Our guidelines are explicit that reviewers should not favor any political group. Biases that nevertheless may emerge from the process described above are bugs, not features.

While disagreements will always exist, we hope sharing this blog post and these instructions will give more insight into how we view this critical aspect of such a foundational technology. It’s our belief that technology companies must be accountable for producing policies that stand up to scrutiny.

We’re always working to improve the clarity of these guidelines—and based on what we’ve learned from the ChatGPT launch so far, we’re going to provide clearer instructions to reviewers about potential pitfalls and challenges tied to bias, as well as controversial figures and themes. Additionally, as part of ongoing transparency initiatives, we are working to share aggregated demographic information about our reviewers in a way that doesn’t violate privacy rules and norms, since this is an additional source of potential bias in system outputs.

We are currently researching how to make the fine-tuning process more understandable and controllable, and are building on external advances such as rule based rewards and Constitutional AI.

Where we’re going: The building blocks of future systems

In pursuit of our mission, we’re committed to ensuring that access to, benefits from, and influence over AI and AGI[1] are widespread. We believe there are at least three building blocks required in order to achieve these goals in the context of AI system behavior.[2]

1. Improve default behavior. We want as many users as possible to find our AI systems useful to them “out of the box” and to feel that our technology understands and respects their values.

Towards that end, we are investing in research and engineering to reduce both glaring and subtle biases in how ChatGPT responds to different inputs. In some cases ChatGPT currently refuses outputs that it shouldn’t, and in some cases, it doesn’t refuse when it should. We believe that improvement in both respects is possible.

Additionally, we have room for improvement in other dimensions of system behavior such as the system “making things up.” Feedback from users is invaluable for making these improvements.

2. Define your AI’s values, within broad bounds. We believe that AI should be a useful tool for individual people, and thus customizable by each user up to limits defined by society. Therefore, we are developing an upgrade to ChatGPT to allow users to easily customize its behavior.

This will mean allowing system outputs that other people (ourselves included) may strongly disagree with. Striking the right balance here will be challenging–taking customization to the extreme would risk enabling malicious uses of our technology and sycophantic AIs that mindlessly amplify people’s existing beliefs.

There will therefore always be some bounds on system behavior. The challenge is defining what those bounds are. If we try to make all of these determinations on our own, or if we try to develop a single, monolithic AI system, we will be failing in the commitment we make in our Charter to “avoid undue concentration of power.”

3. Public input on defaults and hard bounds. One way to avoid undue concentration of power is to give people who use or are affected by systems like ChatGPT the ability to influence those systems’ rules.

We believe that many decisions about our defaults and hard bounds should be made collectively, and while practical implementation is a challenge, we aim to include as many perspectives as possible. As a starting point, we’ve sought external input on our technology in the form of red teaming. We also recently began soliciting public input on AI in education (one particularly important context in which our technology is being deployed).

We are in the early stages of piloting efforts to solicit public input on topics like system behavior, disclosure mechanisms (such as watermarking), and our deployment policies more broadly. We are also exploring partnerships with external organizations to conduct third-party audits of our safety and policy efforts.

Conclusion

Combining the three building blocks above gives the following picture of where we’re headed:

How should AI systems behave, and who should decide?

Sometimes we will make mistakes. When we do, we will learn from them and iterate on our models and systems.

We appreciate the ChatGPT user community as well as the wider public’s vigilance in holding us accountable, and are excited to share more about our work in the three areas above in the coming months.

If you are interested in doing research to help achieve this vision, including but not limited to research on fairness and representation, alignment, and sociotechnical research to understand the impact of AI on society, please apply for subsidized access to our API via the Researcher Access Program.

We are also hiring for positions across Research, Alignment, Engineering, and more.


Footnotes

  1. By AGI, we mean highly autonomous systems that outperform humans at most economically valuable work. ↩︎

  2. In this post, we deliberately focus on this particular scope, and on where we are going in the near term. We are also pursuing an ongoing research agenda taking on these questions. ↩︎

OpenAI

Introducing ChatGPT Plus

Introducing ChatGPT Plus

Introducing ChatGPT Plus

We’re launching a pilot subscription plan for ChatGPT, a conversational AI that can chat with you, answer follow-up questions, and challenge incorrect assumptions.

The new subscription plan, ChatGPT Plus, will be available for $20/month, and subscribers will receive a number of benefits:

  • General access to ChatGPT, even during peak times
  • Faster response times
  • Priority access to new features and improvements

ChatGPT Plus is available to customers in the United States, and we will begin the process of inviting people from our waitlist over the coming weeks. We plan to expand access and support to additional countries and regions soon.

We love our free users and will continue to offer free access to ChatGPT. By offering this subscription pricing, we will be able to help support free access availability to as many people as possible.

Learning from the research preview

We launched ChatGPT as a research preview so we could learn more about the system’s strengths and weaknesses and gather user feedback to help us improve upon its limitations. Since then, millions of people have given us feedback, we’ve made several important updates and we’ve seen users find value across a range of professional use-cases, including drafting & editing content, brainstorming ideas, programming help, and learning new topics.

Our plans for the future

We plan to refine and expand this offering based on your feedback and needs. We’ll also soon be launching the (ChatGPT API waitlist), and we are actively exploring options for lower-cost plans, business plans, and data packs for more availability.

OpenAI

New AI classifier for indicating AI-written text

New AI classifier for indicating AI-written text

New AI classifier for indicating AI-written text

We’re launching a classifier trained to distinguish between AI-written and human-written text.

We’ve trained a classifier to distinguish between text written by a human and text written by AIs from a variety of providers. While it is impossible to reliably detect all AI-written text, we believe good classifiers can inform mitigations for false claims that AI-generated text was written by a human: for example, running automated misinformation campaigns, using AI tools for academic dishonesty, and positioning an AI chatbot as a human.

Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). Our classifier’s reliability typically improves as the length of the input text increases. Compared to our previously released classifier, this new classifier is significantly more reliable on text from more recent AI systems.

We’re making this classifier publicly available to get feedback on whether imperfect tools like this one are useful. Our work on the detection of AI-generated text will continue, and we hope to share improved methods in the future.

Try our work-in-progress classifier yourself:

Limitations

Our classifier has a number of important limitations. It should not be used as a primary decision-making tool, but instead as a complement to other methods of determining the source of a piece of text.

  1. The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
  2. Sometimes human-written text will be incorrectly but confidently labeled as AI-written by our classifier.
  3. We recommend using the classifier only for English text. It performs significantly worse in other languages and it is unreliable on code.
  4. Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.
  5. AI-written text can be edited to evade the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term.
  6. Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

Training the classifier

Our classifier is a language model fine-tuned on a dataset of pairs of human-written text and AI-written text on the same topic. We collected this dataset from a variety of sources that we believe to be written by humans, such as the pretraining data and human demonstrations on prompts submitted to InstructGPT. We divided each text into a prompt and a response. On these prompts we generated responses from a variety of different language models trained by us and other organizations. For our web app, we adjust the confidence threshold to keep the false positive rate very low; in other words, we only mark text as likely AI-written if the classifier is very confident.

Impact on educators and call for input

We recognize that identifying AI-written text has been an important point of discussion among educators, and equally important is recognizing the limits and impacts of AI generated text classifiers in the classroom. We have developed a preliminary resource on the use of ChatGPT for educators, which outlines some of the uses and associated limitations and considerations. While this resource is focused on educators, we expect our classifier and associated classifier tools to have an impact on journalists, mis/dis-information researchers, and other groups.

We are engaging with educators in the US to learn what they are seeing in their classrooms and to discuss ChatGPT’s capabilities and limitations, and we will continue to broaden our outreach as we learn. These are important conversations to have as part of our mission is to deploy large language models safely, in direct contact with affected communities.

If you’re directly impacted by these issues (including but not limited to teachers, administrators, parents, students, and education service providers), please provide us with feedback using this form. Direct feedback on the preliminary resource is helpful, and we also welcome any resources that educators are developing or have found helpful (e.g., course guidelines, honor code and policy updates, interactive tools, AI literacy programs).


Contributors

Michael Lampe, Joanne Jang, Pamela Mishkin, Andrew Mayne, Henrique Ponde de Oliveira Pinto, Valerie Balcom, Michelle Pokrass, Jeff Belgum, Madelaine Boyd, Heather Schmidt, Sherwin Wu, Logan Kilpatrick, Thomas Degry

OpenAI

OpenAI and Microsoft Extend Partnership

We’re happy to announce that OpenAI and Microsoft are extending our partnership.

This multi-year, multi-billion dollar investment from Microsoft follows their previous investments in 2019 and 2021, and will allow us to continue our independent research and develop AI that is increasingly safe, useful, and powerful.

In pursuit of our mission to ensure advanced AI benefits all of humanity, OpenAI remains a capped-profit company and is governed by the OpenAI non-profit. This structure allows us to raise the capital we need to fulfill our mission without sacrificing our core beliefs about broadly sharing benefits and the need to prioritize safety.

Microsoft shares this vision and our values, and our partnership is instrumental to our progress.

  • We’ve worked together to build multiple supercomputing systems powered by Azure, which we use to train all of our models. Azure’s unique architecture design has been crucial in delivering best-in-class performance and scale for our AI training and inference workloads. Microsoft will increase their investment in these systems to accelerate our independent research and Azure will remain the exclusive cloud provider for all OpenAI workloads across our research, API and products.
  • Learning from real-world use – and incorporating those lessons – is a critical part of developing powerful AI systems that are safe and useful. Scaling that use also ensures AI’s benefits can be distributed broadly. So, we’ve partnered with Microsoft to deploy our technology through our API and the Azure OpenAI Service — enabling enterprise and developers to build on top of GPT, DALL·E, and Codex. We’ve also worked together to build OpenAI’s technology into apps like GitHub Copilot and Microsoft Designer.
  • In an effort to build and deploy safe AI systems, our teams regularly collaborate to review and synthesize shared lessons – and use them to inform iterative updates to our systems, future research, and best practices for use of these powerful AI systems across the industry.

We look forward to continued collaboration and advancing this progress with Microsoft.

OpenAI