Frontier risk and preparedness
OpenAI is making efforts to minimize the risks associated with the advancement of artificial intelligence (AI) models. They have formed a new team called Preparedness, led by Aleksander Madry, which will focus on capability assessment, evaluations, and internal red teaming for frontier models. This includes models being developed in the near future as well as those with AGI-level capabilities. The team's mission is to track, evaluate, forecast and protect against catastrophic risks in various categories including individualized persuasion, cybersecurity, CBRN threats (chemical, biological, radiological, and nuclear), and autonomous replication and adaptation.
In addition to risk assessment and protection, the Preparedness team will also develop a Risk-Informed Development Policy (RDP). The RDP will outline the approach to rigorous evaluations and monitoring of frontier models, as well as establish a governance structure for accountability and oversight throughout the development process. This policy will complement OpenAI's existing risk mitigation work in order to enhance the safety and alignment of highly capable systems before and after deployment.
Frontier Model Forum updates
The World Economic Forum, along with philanthropic partners, is launching a new AI Safety Fund to support independent researchers from academic institutions, research institutes, and startups around the world. This fund, initially supported by contributions from Anthropic, Google, Microsoft, and OpenAI, as well as philanthropic partners, is estimated to have initial funding of over $10 million. The purpose of this fund is to facilitate research and development in the field of AI safety, as AI technology continues to advance rapidly. The primary focus of the fund will be on developing new model evaluations and techniques for testing potentially dangerous capabilities of frontier systems. The aim is to raise safety and security standards and provide insights into the mitigations and controls necessary to address the challenges posed by AI systems. The fund will soon issue a call for proposals, and it will be administered by Meridian Institute with the support of an advisory committee comprising independent experts, AI company representatives, and individuals experienced in grantmaking.
DALL·E 3 is now available in ChatGPT Plus and Enterprise
OpenAI has implemented safety measures to limit the generation of potentially harmful or inappropriate content by its language model, DALL·E 3. The company employs a multi-tiered safety system and conducts safety checks on user prompts and resulting images before they are made available. OpenAI has also collaborated with early users and experts to identify and address gaps in its safety systems, particularly in relation to graphic or misleading content. In preparation for wide deployment, steps have been taken to reduce the likelihood of generating content imitating living artists or public figures, as well as to improve demographic representation in the generated images. OpenAI encourages user feedback to further enhance the system's safety and accuracy. The company is also researching and evaluating a provenance classifier—a tool that can identify if an image was generated by DALL·E 3 with high accuracy, even when the image has been modified. However, further collaboration is needed across the AI industry to develop techniques that can help identify AI-generated audio or visual content.