Responsible AI
Risks / safety
Situational awareness: The Decade Ahead (Aschenbrenner 2024)
Interpretability
Mechanistic Interpretability for AI Safety A Review
Do All AI Systems Need to Be Explainable?
Self-explaining SAE features — LessWrong
Temp - to review:
- A Barebones Guide to Mechanistic Interpretability Prerequisites — Neel Nanda
- A Comprehensive Mechanistic Interpretability Explainer & Glossary — Neel Nanda
- Mechanistic Interpretability — first look | by Stephen Jonany | Medium
- adamcasson/mechanistic-interpretability
- "Mechanistic interpretability" for LLMs, explained
- Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety - YouTube
- The Platonic Representation Hypothesis
- AI models collapse when trained on recursively generated data | Nature
Oversight & standards
Lessons from the FDA for AI - AI Now Institute
Governing General Purpose AI — A Comprehensive Map of Unreliability, Misuse and Systemic Risks
Etc.
Yann LeCun - A Path Towards Autonomous Machine Intelligence
Reasoning through arguments against taking AI safety seriously: Yoshua Bengio 2024.07.09
Towards a Cautious Scientist AI with Convergent Safety Bounds: Yoshua Bengio 2024.02.26
ADD / XOR / ROL: Someone is wrong on the internet (AGI Doom edition)
Defining AGI
The Turing Test and our shifting conceptions of intelligence | Science
Accepting not-G AI
Setting boundaries
Jaana Dogan: LLMs are tools to navigate a corpus based on a very biased and authoritative prompt.