Current study materials

Building and evaluating alignment auditing agents

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats | alphaXiv

[2502.01492] Develop AI Agents for System Engineering in Factorio

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety - 2507.11473v1.pdf

Common Elements of Frontier AI Safety Policies

[2507.20964] Core Safety Values for Provably Corrigible Agents

Justify your answer - by Ben Recht - arg min

Introduction to deep learning with applications to stochastic control and games - YouTube

Learning the natural history of human disease with generative transformers | Nature

Jared Kaplan: ContemporaryMLforPhysicists (pdf)

  • Starting on page 55, the Architectures section covers the structure and componentes of deep neural networks from a (mathematical and statistical) modeling perspective.

Alignment Science Blog

Tips for Empirical Alignment Research — AI Alignment Forum

AI Rights for Human Flourishing

How to Make the Future Better: Concrete Actions for Flourishing

Claude Code: Behind-the-scenes of the master agent loop

Deriving Muon

  • core numerical methods derived from an exact theoretical principle
  • contrast with popular optimizers like Adam, which have more heuristic origins

David Ha’s early work: ōtoro.net

Molnar: From Frequencies to Coverage: Rethinking What “Representative” Means

Molnar: Don’t fix your imbalanced data

GPT-oss from the Ground Up - by Cameron R. Wolfe, Ph.D.

Gemma 3 270M: Can Tiny Models Learn New Tasks?

How to Think About GPUs | How To Scale Your Model

Neuronpedia

Building CERN for AI - An institutional blueprint - Centre for Future Generations

Process knowledge is crucial to economic development

Four places where you can put LLM monitoring — LessWrong

The Artificiality of Alignment - by jessica dai - Reboot

[2503.05336v3] Toward an Evaluation Science for Generative AI Systems

Data Provenance Initiative

There is only one model - by Jack Morris - Token for Token

The Big LLM Architecture Comparison

Cohere Labs: 2025 Summer School - recorded talks

Causal Artificial Intelligence Book

FIIR

On the criteria to be used in decomposing systems into modules - 361598.361623.pdf

Stanford CS336 | Language Modeling from Scratch

Marin

Introduction | RLHF Book by Nathan Lambert

SciCode - SciCode Benchmark

Aurora GPT - Argonne National Lab

Evaluation Framework for AI Systems in "the Wild" | alphaXiv

EvalEval 2024 - neurips 2024 workshop

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects - 2505.18893v4.pdf

Physics of Language Models - Allen Zhu

Lecture Videos | Introduction to Algorithms | Electrical Engineering and Computer Science | MIT OpenCourseWare

[2502.15657] Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Bertsekas - RL courses and book - mit.edu/~dimitrib/RLbook.html

Raft implemented in Go, Eli Bendersky

Layers of Memory, Layers of Compression - Tim Kellogg

CSES - CSES Problem Set - Tasks

Things that go wrong with disk IO | notes.eatonphil.com

Statistical Significance, p-Values, and the Reporting of Uncertainty - imbens-2021-statistical-significance-p-values-and-the-reporting-of-uncertainty.pdf

AI and the Everything in the Whole Wide World Benchmark

A Philosophy of Software Design - John Ousterhout (pdf)

[1807.02811] A Tutorial on Bayesian Optimization

Tracing the thoughts of a large language model \ Anthropic

[2401.17173] Zero-Shot Reinforcement Learning via Function Encoders

Severe deviation in protein fold prediction by advanced AI: a case study | Scientific Reports

Building multi-source ingestion pipelines the right way

MouseGPT: A Large-scale Vision-Language Model for Mouse Behavior Analysis - 2025.03.27.645630v1.full.pdf

Taking a responsible path to AGI - Google DeepMind

[2502.01706] Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction

[2503.20511] From reductionism to realism: Holistic mathematical modelling for complex biological systems

Learning with not Enough Data Part 1: Semi-Supervised Learning | Lil'Log

MLOps system design is boring. - by Alexandru Vesa


Demystifying Chains, Trees, and Graphs of Thoughts

Sequential decision making - Kevin Murphy, DeepMind

Strategic Foundation Models - Large_Language_Models__Foundation_Models_and_Game_Theory___Research_Manifesto (16).pdf

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

International AI safety report - International_AI_Safety_Report_2025_accessible_f.pdf

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient

A Little Bit of Reinforcement Learning from Human Feedback

DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs

A Mathematical Framework for Transformer Circuits

A Recipe for Training Neural Networks

Scaling and networking a modular photonic quantum computer | Nature

[2309.16177] Navigating the Noise: Bringing Clarity to ML Parameterization Design with O(100) Ensembles