OpenAI’s AI Safety Framework: Ensuring Safe Model Development and Deployment

the health strategist
platform

the most compreehensive knowledge portal 
for continuous health transformation 
and digital health- for all

Joaquim Cardoso MSc.

Chief Research and Strategy Officer (CRSO),
Chief Editor and Senior Advisor

December 19, 2023

What is the message?

OpenAI acknowledges the shortfall in the study of frontier AI risks and introduces the Preparedness Framework (Beta) as a comprehensive strategy to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly powerful models.

The framework emphasizes a science-driven, fact-grounded approach and highlights the company’s commitment to safety in the development and deployment of AI technologies.

One page summary

What are the key points?

Safety Teams and Rigorous Evaluations:

  • OpenAI emphasizes the importance of grounding preparedness in science and investing in rigorous capability evaluations and forecasting.
  • The company aims to move beyond hypothetical scenarios, utilizing concrete measurements and data-driven predictions to detect emerging risks.

Builder’s Mindset to Safety:

  • OpenAI, rooted in the coupling of science and engineering, brings a builder’s mindset to safety, learning from real-world deployment to mitigate emerging risks.
  • The iterative deployment approach ensures that safety work keeps pace with innovation.

Preparedness Framework (Beta) Approach:

  • The framework outlines a systematic approach to develop and deploy frontier models safely.
  • Regular evaluations, risk “scorecards,” and detailed reports are integral to assessing and mitigating specific risks associated with frontier models.

Risk Thresholds and Safety Measures:

  • Defined risk thresholds trigger baseline safety measures, categorized into cybersecurity, CBRN threats, persuasion, and model autonomy.
  • Only models meeting safety criteria can be deployed or developed further, with additional security measures for higher-risk models.

Safety Decision-Making Structure:

  • A dedicated team oversees technical work, with a Safety Advisory Group providing cross-functional review of reports sent to Leadership and the Board of Directors.
  • Leadership makes decisions, while the Board of Directors holds the right to reverse them.

Safety Protocols and Accountability:

  • Regular safety drills and protocols for rapid response to urgent issues are established.
  • External accountability is emphasized through audits by qualified, independent third-parties, red-teaming, and external model evaluation updates.

Collaboration and Continuous Risk Reduction:

  • Collaboration with external parties and internal teams is crucial for tracking real-world misuse and emergent misalignment risks.
  • Pioneering research measures how risks evolve with model scaling, aiming to forecast risks in advance and identify “unknown unknowns.”

What are the key strategies?

OpenAI’s strategy involves a proactive and iterative approach to safety, combining rigorous evaluations, risk thresholds, accountability structures, collaboration, and continuous risk reduction to ensure the safe development and deployment of frontier AI models.

What are the key examples?

Examples of strategies in action include the regular evaluations and “scorecards” for models, risk thresholds triggering safety measures, and the creation of a Safety Advisory Group for cross-functional oversight.

Conclusion

The Preparedness Framework (Beta) represents OpenAI’s commitment to advancing safety in AI development. This living document will be regularly updated based on ongoing learning and feedback, emphasizing transparency and collaboration.

Click here to read the complete version.

DEEP DIVE

Preparedness

The study of frontier AI risks has fallen far short of what is possible and where we need to be. To address this gap and systematize our safety thinking, we are adopting the initial version of our Preparedness Framework. It describes OpenAI’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly powerful models.

Open AI

December 18, 2023

The Preparedness team is dedicated to making frontier AI models safe

We have several safety and policy teams working together to mitigate risks from AI. Our Safety Systems team focuses on mitigating misuse of current models and products like ChatGPT. Superalignment builds foundations for the safety of superintelligent models that we (hope) to have in a more distant future. The Preparedness team maps out the emerging risks of frontier models, and it connects to Safety Systems, Superalignment and our other safety and policy teams across OpenAI.

1 Safety Teams

Preparedness should be driven by science and grounded in facts

We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what’s happening today to anticipate what’s ahead. This is so critical to our mission that we are bringing our top technical talent to this work. 

We bring a builder’s mindset to safety

Our company is founded on tightly coupling science and engineering, and the Preparedness Framework brings that same approach to our work on safety. We learn from real-world deployment and use the lessons to mitigate emerging risks. For safety work to keep pace with the innovation ahead, we cannot simply do less, we need to continue learning through iterative deployment.

Preparedness Framework (Beta)

Our Preparedness Framework (Beta) lays out the following approach to develop and deploy our frontier models safely:


We will run evaluations and continually update “scorecards” for our models. 
We will evaluate all our frontier models, including at every 2x effective compute increase during training runs. We will push models to their limits. These findings will help us assess the risks of our frontier models and measure the effectiveness of any proposed mitigations. Our goal is to probe the specific edges of what’s unsafe to effectively mitigate the revealed risks. To track the safety levels of our models, we will produce risk “scorecards” and detailed reports.

2 Scorecard

We will define risk thresholds that trigger baseline safety measures. We have defined thresholds for risk levels along the following initial tracked categories – cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy. We specify four safety risk levels, and only models with a post-mitigation score of “medium” or below can be deployed; only models with a post-mitigation score of “high” or below can be developed further. We will also implement additional security measures tailored to models with high or critical (pre-mitigation) levels of risk.

3 Risk

We will establish a dedicated team to oversee technical work and an operational structure for safety decision-making. The Preparedness team will drive technical work to examine the limits of frontier models capability, run evaluations, and synthesize reports. This technical work is critical to inform OpenAI’s decision-making for safe model development and deployment. We are creating a cross-functional Safety Advisory Group to review all reports and send them concurrently to Leadership and the Board of Directors. While Leadership is the decision-maker, the Board of Directors holds the right to reverse decisions.

4 Team

We will develop protocols for added safety and outside accountability. The Preparedness Team will conduct regular safety drills. Some safety issues can emerge rapidly, so we have the ability to mark urgent issues for rapid response. We believe it is instrumental that this work gets feedback from people outside OpenAI and expect to have audits conducted by qualified, independent third-parties. We will continue having others red-team and evaluate our models, and we plan to share updates externally.  


We will help reduce other known and unknown safety risks. 
We will collaborate closely with external parties as well as internal teams like Safety Systems to track real-world misuse. We will also work with Superalignment on tracking emergent misalignment risks. We are also pioneering new research in measuring how risks evolve as models scale, to help forecast risks in advance, similar to our earlier success with scaling laws. Finally, we will run a continuous process to try surfacing any emerging “unknown unknowns.”

This is a summary of the main components of the Preparedness Framework (Beta), and we encourage you to read the complete version. This framework is the initial Beta version that we are adopting, and is intended to be a living document. We expect it to be updated regularly as we learn more and receive additional feedback. We welcome your thoughts at pf@openai.com.

Originally published at https://openai.com/safety/preparedness.

Total
0
Shares
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts

Subscribe

PortugueseSpanishEnglish
Total
0
Share