Frontier risk and preparedness
OpenAI has announced the formation of a new team called Preparedness, which aims to assess and mitigate risks associated with the development of artificial intelligence (AI) models. Led by Aleksander Madry, the team will focus on capability assessment, evaluations, and internal red teaming for models with advanced AI capabilities, including those with AGI-level capabilities. The Preparedness team will tackle a range of potential catastrophic risks, such as individualized persuasion, cybersecurity threats, chemical, biological, radiological, and nuclear (CBRN) threats, as well as autonomous replication and adaptation (ARA).
One of the key objectives of the Preparedness team is to develop and maintain a Risk-Informed Development Policy (RDP). This policy will outline OpenAI's approach to conducting rigorous evaluations of frontier models' capabilities, monitoring potential risks, implementing protective measures, and establishing a governance structure for accountability and oversight throughout the development process. The RDP will complement OpenAI's existing risk mitigation efforts, which aim to ensure the safety and alignment of highly capable systems from development to deployment. The formation of the Preparedness team reflects OpenAI's commitment to proactive risk assessment and mitigation as AI models continue to advance.
Frontier Model Forum updates
The World Economic Forum, together with philanthropic partners, has launched the AI Safety Fund to support independent researchers in studying and addressing the safety concerns of artificial intelligence (AI). The initial funding for the project comes from companies such as Anthropic, Google, Microsoft, and OpenAI, as well as philanthropic organizations and individuals like the Patrick J. McGovern Foundation, the David and Lucile Packard Foundation, Eric Schmidt, and Jaan Tallinn. The goal of the fund is to encourage a wider range of voices and perspectives in the AI safety discussion by providing funding for the evaluation, testing, and development of techniques for potentially dangerous AI systems. The focus will be on model evaluations and red teaming to enhance safety and security standards in the industry. The Fund will be administered by the Meridian Institute, with support from an advisory committee made up of experts in the field. A call for proposals will be announced in the coming months.
DALL·E 3 is now available in ChatGPT Plus and Enterprise
OpenAI has implemented safety measures to limit potentially harmful content generated by its AI model DALL·E 3. The system undergoes safety checks that run over user prompts and resulting imagery before it is made available to users. Early users and expert red-teamers have provided feedback to identify and address gaps in safety system coverage, including generating graphic or misleading content. OpenAI has also taken steps to mitigate the likelihood of generating content resembling the style of living artists or public figures and to enhance demographic representation in the generated images. User feedback is encouraged to further improve the AI model. OpenAI is also researching an initial version of a provenance classifier, which can identify whether an image was generated by DALL·E 3 with over 99% accuracy. The classifier, although not foolproof, may aid in understanding if audio or visual content is AI-generated. OpenAI acknowledges that addressing this challenge will require collaboration across the AI industry.