Safe AI

Machine learning (ML) systems optimized for generalized learning via training on very broad data sets are experiencing rapid advances, primarily due to investments in compute, data, and energy. Artificial general intelligence (AGI) is a sort of benchmark of capability of ML systems; roughly, when a compute system becomes capable of producing knowledge-work outputs comparable to those of a human across a wide range of knowledge areas, it would be considered AGI.

Computers achieving that degree of ability represent the possibility of both relieving humans of work¹ and also the possibility of exceeding human (work production) capacity. An AI’s compute speed enables it to process data faster than a human, and the way an artificial neural net represents information enables the AI to potentially hold more ideas ‘in its head’ simultaneously, enabling it to draw on more information while ’thinking’.

One might imagine risks would accompany an artificial neural network having such supercapabilities, particularly if given agency - or if it manages to figure out how to assume agency on its own. Even if the leap to AGI either takes years (or never comes to pass), current ML systems carry a subset of AGI’s sizable barrel of risks, too. Generally speaking, the field of AI safety addresses the quantification and mitigation of those risks: sometimes addressing machine learning as it currently exists, and sometimes AGI.

AI alignment

Generally, alignment in this context can represent characterization of two different things:

model output alignment with user intent
model alignment with human values

The second - aligning with human values - is considerably trickier to quantify as not only is there ambiguity about what those values might be and whether they’re fully represented, but it’s non-trivial to evaluate whether the representation of those values in the model’s loss function(s) is accurate, complete, and robust.

2024.09 Expert questions for model eval:

Humanity's Last Exam

Current/2024:

GPQA diamond: [2311.12022] GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Leaderboards:

awesome-ml/llm-tools.md at master · underlines/awesome-ml

Watch for low quality evals: MMLU, Human Eval, per Jim Fan, NVidia:

“I would not trust any claims of a superior model until I see the following: 1. ELO points on LMSys Chatbot Arena. It’s difficult to game democracy in the wild. 2. Private LLM evaluation from a trusted 3rd party, such as Scale AI’s benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.”

Also, see: Dangerous capability tests should be harder

Introducing SimpleQA | OpenAI

Interpretability

Multimodal interpretability in 2024

On the Interpretability of Artificial Intelligence in Radiology: Challenges and Opportunities | Radiology: Artificial Intelligence

Do All AI Systems Need to Be Explainable?

Self-explaining SAE features — LessWrong

JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.

[2408.07852] Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

Mechanistic Interpretability:

Representation engineering:

Bias

A look at Bias in Generative AI [Thoughts] - by Devansh

Intended model bias aka representation engineering

Representation Engineering Mistral-7B an Acid Trip

Oversight & standards

Lessons from the FDA for AI - AI Now Institute

Governing General Purpose AI — A Comprehensive Map of Unreliability, Misuse and Systemic Risks

Etc.

Should AI Progress Speed Up, Slow Down, or Stay the Same?

Yann LeCun - A Path Towards Autonomous Machine Intelligence

Reasoning through arguments against taking AI safety seriously: Yoshua Bengio 2024.07.09

Towards a Cautious Scientist AI with Convergent Safety Bounds: Yoshua Bengio 2024.02.26

ADD / XOR / ROL: Someone is wrong on the internet (AGI Doom edition)

lcamtuf @HalvarFlake FWIW… - Infosec Exchange
Lucas Beyer (PhD in computer vision) stopped working on CV on discovering that corporations interested in his work were creating autonomous weapons systems: Ethical considerations around Vision and Robotics

Conversations -