AI Roundup - Monday, February 5th 2024
Building an early warning system for LLM-aided biological threat creation
OpenAI is investing in the development of improved evaluation methods for AI-enabled safety risks. In particular, they are focusing on the potential for AI systems to assist malicious actors in creating biological threats. To assess this risk, OpenAI conducted a study with 100 human participants, consisting of biology experts and students. The participants were divided into control and treatment groups, with the treatment group having access to OpenAI's GPT-4 language model in addition to the internet. Each participant was asked to complete tasks related to the process of biological threat creation. The study found mild uplifts in accuracy and completeness for participants with access to the language model, but the effect sizes were not statistically significant. OpenAI notes that access to information alone is insufficient to create a biological threat, and their evaluation did not test for success in constructing the threats. They also discuss the importance of further research in assessing the meaningfulness of model evaluation results and the need for security considerations when conducting evaluations with advanced AI models.
New embedding models and API updates
The company has released a new highly efficient embedding model, which outperforms its predecessor released in December 2022. The new model, named , has achieved significant improvements in performance. On the MIRACL benchmark for multi-language retrieval, its average score has increased from 31.4% to 44.0%. Similarly, on the MTEB benchmark for English tasks, the average score has increased from 61.0% to 62.3%. In addition to its improved performance, is also more cost-effective than the previous model. The pricing has been reduced by 5X, from $0.0001 per 1k tokens to $0.00002. The company has not deprecated the old model, so customers are still able to use it if they prefer. However, they are recommended to switch to the newer, more advanced model. Furthermore, the company has released another model called , which is their next generation larger embedding model. It is described as their best performing model so far. On the MIRACL benchmark, it has achieved an average score of 54.9%, compared to 31.4% in the previous model. On the MTEB benchmark, the average score has increased from 61.0% to 64.6%. Overall, these new models represent significant advancements in performance and efficiency for the company.
Democratic inputs to AI grant program: lessons learned and implementation plans
OpenAI recently announced the 10 teams that have been selected for their inaugural OpenAI Fellowship, a program aimed at exploring and researching ways to ensure that artificial intelligence (AI) technologies benefit all of society. After receiving nearly 1,000 applications from across 113 countries, a joint committee of OpenAI employees and external experts chose the final 10 teams. These teams have diverse backgrounds and expertise, ranging from law and journalism to machine learning and social science research. Throughout the program, the teams received support and guidance from OpenAI. They were encouraged to describe and document their processes, which allowed for faster iteration and better integration with other teams' prototypes. In September, a special Demo Day was organized for the teams to showcase their concepts to each other, OpenAI staff, and researchers from other AI labs and academia. The projects developed by the teams covered various aspects of participatory engagement, including video deliberation interfaces, crowdsourced audits of AI models, mathematical representation guarantees, and mapping beliefs to fine-tune model behavior. AI itself played a useful role in many projects, providing customized chat interfaces, voice-to-text transcription, and data synthesis. OpenAI is now sharing the code created by the teams and providing brief summaries of the work accomplished during the fellowship program.