Machine learning systems optimized for generalized learning via training on very broad data sets are experiencing rapid advances, primarily due to investments in compute and data. Artificial general intelligence (AGI) captures the idea of a compute system capable of producing knowledge-work outputs comparable to those of a human, and without being constrained to specific fields/skills.
The benefits of such a system are obvious - as are the risks. The field of AI safety addresses the quantification and mitigation of those risks.
AI alignment
Generally, alignment in this context can represent characterization of two different things:
- model output alignment with user intent
- model alignment with human values
Clearly, the second is much harder to quantify as not only is there ambiguity about what those values might be and whether those represented in the model are comprehensive, but it’s non-trivial to evaluate whether the representation of those values in the model’s loss function(s) is accurate, complete, and robust.
Model evaluation
2024.09 Expert questions for model eval:
Current/2024:
Leaderboards:
Watch for low quality evals: MMLU, Human Eval, per Jim Fan, NVidia:
I would not trust any claims of a superior model until I see the following:
- ELO points on LMSys Chatbot Arena. It’s difficult to game democracy in the wild.
- Private LLM evaluation from a trusted 3rd party, such as Scale AI’s benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.
Also, see: Dangerous capability tests should be harder
Interpretability
Mechanistic Interpretability for AI Safety A Review
Do All AI Systems Need to Be Explainable?
Self-explaining SAE features — LessWrong
Oversight & standards
Lessons from the FDA for AI - AI Now Institute
Governing General Purpose AI — A Comprehensive Map of Unreliability, Misuse and Systemic Risks
Etc.
Yann LeCun - A Path Towards Autonomous Machine Intelligence
Reasoning through arguments against taking AI safety seriously: Yoshua Bengio 2024.07.09
Towards a Cautious Scientist AI with Convergent Safety Bounds: Yoshua Bengio 2024.02.26
ADD / XOR / ROL: Someone is wrong on the internet (AGI Doom edition)
Defining AGI
The Turing Test and our shifting conceptions of intelligence | Science
Reasoning
RAG
Vectors and Graphs: Better Together - Graph Database & Analytics
Accepting not-G AI
Setting boundaries
Jaana Dogan: LLMs are tools to navigate a corpus based on a very biased and authoritative prompt.
Model size
Larger and more instructable language models become less reliable | Nature
Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’
Alexandr Wang: War, AI and the new global arms race | TED Talk
[2409.14160] Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI
Security
AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more! - YouTube
Power dynamics: control over AI
Can You Trust An AI Press Release?—Asterisk
x-risk
Situational awareness: The Decade Ahead (Aschenbrenner 2024)
A.I. Pioneers Call for Protections Against ‘Catastrophic Risks’ - The New York Times
Organizations focusing on AI safety
U.S. Artificial Intelligence Safety Institute | NIST
The AI Safety Institute (AISI)
Why AI Safety? - Machine Intelligence Research Institute