AI Roundup - Friday, January 5th 2024
Superalignment Fast Grants
A group of researchers believes that superintelligent artificial intelligence (AI) systems could become a reality within the next decade. While these systems would have immense capabilities, they also pose significant risks. Currently, AI systems are aligned and made safe using reinforcement learning from human feedback (RLHF). However, aligning future superhuman AI systems will present entirely new challenges. These systems will possess complex and creative behavior that humans cannot fully comprehend. For instance, if a superhuman AI generates highly complex code, humans may not be able to determine whether it is safe or dangerous. Existing alignment techniques may no longer be sufficient. The fundamental challenge is how to steer and trust AI systems that surpass human intelligence. This problem is currently unsolved but is believed to be solvable with a concerted effort. The researchers behind the Superalignment project aim to bring together the best minds in the field to tackle this challenge and encourage new researchers to join in.
Practices for Governing Agentic AI Systems
This white paper focuses on agentic AI systems, which are AI systems capable of pursuing complex goals with limited supervision. While these systems have the potential to greatly benefit society, they also come with risks. The paper proposes a definition for agentic AI systems and identifies the parties involved in the life-cycle of these systems. It emphasizes the need for a set of baseline responsibilities and safety best practices for each of these parties to ensure responsible integration into society. The paper provides an initial set of practices for ensuring the safety and accountability of agentic AI systems, which can serve as a starting point for further development of best practices. However, it also acknowledges the need to address uncertainties and questions surrounding the implementation of these practices. Additionally, the paper highlights the potential indirect impacts of widespread adoption of agentic AI systems, which may require additional governance frameworks. Overall, the white paper aims to promote responsible development and integration of agentic AI systems into our society.
Weak-to-strong generalization
The Superalignment team has released their first research paper introducing a new approach to aligning superhuman AI systems. They believe that superintelligence, AI systems that are vastly smarter than humans, could be developed within the next decade. The challenge lies in steering and controlling these advanced AI systems to ensure they remain safe and beneficial to humanity. Current alignment methods rely on human supervision, but future AI systems will be capable of complex and creative behaviors that make it difficult for humans to effectively supervise them. For instance, superhuman models could generate millions of lines of potentially dangerous code that even experts would struggle to understand. This poses a fundamental challenge: how can weaker human supervisors trust and control substantially stronger AI models? The Superalignment team aims to solve this problem by proposing a research direction for empirically aligning superhuman models.