Frontier risk and preparedness
OpenAI is taking proactive steps to minimize the risks associated with the development of AI models. They have announced the creation of a new team called Preparedness, which will be headed by Aleksander Madry. The team's main focus will be on capability assessment, evaluations, and internal red teaming for frontier models, including those with AGI-level capabilities. This team will not only track and evaluate risks but also work towards protecting against catastrophic risks across various categories such as individualized persuasion, cybersecurity, CBRN threats (chemical, biological, radiological, and nuclear), and autonomous replication and adaptation.
The Preparedness team's mission also includes the development and maintenance of a Risk-Informed Development Policy (RDP). This policy will outline OpenAI's approach to rigorous evaluations and monitoring of frontier model capabilities. It will also establish a spectrum of protective actions and a governance structure for accountability and oversight throughout the development process. This RDP is designed to complement and extend OpenAI's existing risk mitigation efforts, ensuring the safety and alignment of highly capable systems from the early stages of development to deployment and beyond.
Frontier Model Forum updates
The World Economic Forum (WEF) has announced the creation of a new AI Safety Fund, aimed at supporting independent researchers from around the world in their academic exploration of artificial intelligence (AI). Partners in the fund include Anthropic, Google, Microsoft and OpenAI. Initial funding worth $10m has been provided by philanthropic partners including the Patrick J. McGovern Foundation, the David and Lucile Packard Foundation, and individuals such as Eric Schmidt, and Jaan Tallinn. The fund aims to advance AI safety research and improve the evaluation and understanding of AI systems, whilst also addressing vulnerabilities within these technologies. The WEF views the fund as crucial for fulfilling the voluntary commitments to improve AI safety made by Forum members at the White House earlier this year.
DALL·E 3 is now available in ChatGPT Plus and Enterprise
The developers of DALL·E 3, an AI model that generates images from text prompts, have implemented safety measures to prevent the generation of harmful or inappropriate content. The system undergoes safety checks to identify and remove violent, adult, or hateful imagery before it is shown to users. Early users and expert red-teamers provided valuable feedback to help identify potential issues, such as generating sexual or misleading images, and these were addressed to enhance the safety of the model. Additionally, steps have been taken to limit the likelihood of generating content in the style of living artists or public figures and to improve demographic representation in the generated images. User feedback is encouraged to further improve the system and ensure responsible AI development. The research team is also working on a provenance classifier, which can determine if an image was generated by DALL·E 3 with high accuracy, even when common modifications are applied. This classifier aims to provide transparency about the origins of AI-generated visuals and will require collaboration across the AI industry.