Machine learning (ML) systems optimized for generalized learning via training on very broad data sets are experiencing rapid advances, primarily due to investments in compute, data, and energy. Artificial general intelligence (AGI) is a sort of benchmark of capability of ML systems; roughly, when a compute system becomes capable of producing knowledge-work outputs comparable to those of a human across a wide range of knowledge areas, it would be considered AGI.
Computers achieving that degree of ability represent the possibility of both relieving humans of work1 and also the possibility of exceeding human (work production) capacity. An AI’s compute speed enables it to process data faster than a human, and the way an artificial neural net represents information enables the AI to potentially hold more ideas ‘in its head’ simultaneously, enabling it to draw on more information while ’thinking’.
One might imagine risks would accompany an artificial neural network having such supercapabilities, particularly if given agency - or if it manages to figure out how to assume agency on its own. Even if the leap to AGI either takes years (or never comes to pass), current ML systems carry a subset of AGI’s sizable barrel of risks, too. Generally speaking, the field of AI safety addresses the quantification and mitigation of those risks: sometimes addressing machine learning as it currently exists, and sometimes AGI.
AI alignment
Generally, alignment in this context can represent characterization of two different things:
- model output alignment with user intent
- model alignment with human values
The second - aligning with human values - is considerably trickier to quantify as not only is there ambiguity about what those values might be and whether they’re fully represented, but it’s non-trivial to evaluate whether the representation of those values in the model’s loss function(s) is accurate, complete, and robust.
IRL experiences
(1) thebes on X: "is an llm agent aligned just because the llm is? not necessarily
OpenAI's new models "instrumentally faked alignment"
The Fallacy of AI Functionality
Scaling
OpenAI’s Strawberry and inference scaling laws
AI Capabilities Can Be Significantly Improved Without Expensive Retraining | Epoch AI
How Scaling became a Local Optima in Deep Learning [Markets]
Model evaluation
On the Measure of Intelligence - Chollet
2024.09 Expert questions for model eval:
Current/2024:
Leaderboards:
Watch for low quality evals: MMLU, Human Eval, per Jim Fan, NVidia:
“I would not trust any claims of a superior model until I see the following: 1. ELO points on LMSys Chatbot Arena. It’s difficult to game democracy in the wild. 2. Private LLM evaluation from a trusted 3rd party, such as Scale AI’s benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.”
Also, see: Dangerous capability tests should be harder
Interpretability
Multimodal interpretability in 2024
Do All AI Systems Need to Be Explainable?
Self-explaining SAE features — LessWrong
Mechanistic Interpretability:
- Mechanistic Interpretability for AI Safety A Review
- Mechanistic Interpretability for AI Safety A Review
- 2024.blackboxnlp-1.30.pdf
- A Comprehensive Mechanistic Interpretability Explainer & Glossary — AI Alignment Forum
- 16_Measuring_Mechanistic_Inter.pdf
- How useful is mechanistic interpretability? — LessWrong
Representation engineering:
- [2310.01405] Representation Engineering: A Top-Down Approach to AI Transparency
- Representation Engineering Mistral-7B an Acid Trip
Bias
A look at Bias in Generative AI [Thoughts] - by Devansh
Intended model bias aka representation engineering
Oversight & standards
Lessons from the FDA for AI - AI Now Institute
Governing General Purpose AI — A Comprehensive Map of Unreliability, Misuse and Systemic Risks
Etc.
Should AI Progress Speed Up, Slow Down, or Stay the Same?
Yann LeCun - A Path Towards Autonomous Machine Intelligence
Reasoning through arguments against taking AI safety seriously: Yoshua Bengio 2024.07.09
Towards a Cautious Scientist AI with Convergent Safety Bounds: Yoshua Bengio 2024.02.26
ADD / XOR / ROL: Someone is wrong on the internet (AGI Doom edition)
-
Lucas Beyer (PhD in computer vision) stopped working on CV on discovering that corporations interested in his work were creating autonomous weapons systems: Ethical considerations around Vision and Robotics
Conversations -
Defining AGI
The Turing Test and our shifting conceptions of intelligence | Science
Reasoning
RAG
Vectors and Graphs: Better Together - Graph Database & Analytics
Accepting not-G AI
Setting boundaries
Jaana Dogan: LLMs are tools to navigate a corpus based on a very biased and authoritative prompt.
Model size
Larger and more instructable language models become less reliable | Nature
Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’
Alexandr Wang: War, AI and the new global arms race | TED Talk
[2409.14160] Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI
Security
AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more! - YouTube
Power dynamics: control over AI
[2312.06942] AI Control: Improving Safety Despite Intentional Subversion
Can You Trust An AI Press Release?—Asterisk
On Civilizational Triumph - by Dean W. Ball
x-risk
Dario Amodei — Machines of Loving Grace
Situational awareness: The Decade Ahead (Aschenbrenner 2024)
A.I. Pioneers Call for Protections Against ‘Catastrophic Risks’ - The New York Times
Organizations focusing on AI safety
AI Safety Awareness Foundation
- Changlin Li is a fellow Recurser
U.S. Artificial Intelligence Safety Institute | NIST
The AI Safety Institute (AISI)
Why AI Safety? - Machine Intelligence Research Institute
AI safety research training
AI Security
HydroX AI | Advanced AI Safety and Security Solutions
Prediction discussions
The phony comforts of AI skepticism
Mission critical systems & ML
Medical
Reconciling privacy and accuracy in AI for medical imaging | Nature Machine Intelligence
Security Considerations for AI in Radiology
Demographics of contributors to ML, AI, AGI, and AI safety
Artificial Intelligence and gender equality | UN Women – Headquarters
-
Of which, presumably, at least some is work no one actually wants to do - though even that seems a tricky deal to strike just right. ↩︎