Frontier risk and preparedness
In order to address potential risks associated with the development and deployment of AI models, OpenAI has announced the creation of a new team called Preparedness. Led by Aleksander Madry, the team will focus on capability assessment, evaluations, and internal red teaming for advanced AI models, from near-future models to those with superintelligent capabilities. The mission of the Preparedness team includes tracking, evaluating, forecasting, and protecting against catastrophic risks in various domains. These domains include individualized persuasion, cybersecurity, chemical, biological, radiological, and nuclear threats, as well as autonomous replication and adaptation. In addition, the team will develop and maintain a Risk-Informed Development Policy (RDP) that outlines OpenAI's approach to rigorous model capability evaluations, monitoring, protective actions, and governance structure. The RDP is designed to complement and expand upon OpenAI's existing risk mitigation efforts in order to ensure the safety and ethical alignment of highly capable AI systems.
Frontier Model Forum updates
The World Economic Forum, along with philanthropic partners, is creating a new AI Safety Fund to support independent researchers in the field of AI safety. The fund, which has received initial funding of over $10 million from organizations including Google, Microsoft, and OpenAI, aims to address the gap in academic research on AI safety. The funding will be used to develop new model evaluations and techniques for red teaming AI models, in order to identify potential vulnerabilities and risks associated with advanced AI systems. The primary focus of the fund will be on evaluating and understanding potentially dangerous capabilities of frontier AI systems. The call for proposals will be announced in the coming months and the fund will be administered by Meridian Institute, with support from an advisory committee of external experts. This initiative is seen as an important step in fulfilling the AI commitments made by Forum members earlier this year, which included facilitating third-party discovery and reporting of vulnerabilities in AI systems.
DALL·E 3 is now available in ChatGPT Plus and Enterprise
OpenAI has implemented a multi-tiered safety system for its AI model, DALL·E 3, to prevent the generation of harmful or inappropriate content. Safety checks are conducted on user prompts and resulting images before they are provided to users. Feedback from early users and expert red-teamers has been instrumental in identifying and addressing safety gaps, such as generation of graphic or misleading content. OpenAI has also taken steps to limit the model's ability to generate content resembling living artists' work or public figures, while improving demographic representation in the generated images. Users of ChatGPT can provide feedback using the flag icon to report unsafe or inaccurate outputs. OpenAI is also developing a provenance classifier to determine whether an image was generated by DALL·E 3. Early internal evaluations show over 99% accuracy in identifying unmodified images and over 95% accuracy even with common modifications. While the classifier cannot provide definitive conclusions, it may be part of a range of techniques to help identify AI-generated content. OpenAI stresses the importance of collaboration to address this challenge effectively.