AI Alignment Research

Active projects

emmy1

Evaluation-invariant measurement for alignment of multi-agent systems

liar liar

Is the model scheming to deceive… or is it just wrong?

activation tomography

Natural Language Autoencoders as measurement instruments for AI safety

paper chase

Multi-agent simulation of a scientific publishing ecosystem

Technical AI safety and alignment research interests

Construct-valid instruments for measuring properties of AI systems. Recent projects I’m excited about include faithful reconstruction of model latent space and measurement of emergent properties of AI collectives.

I’m broadly interested in technical AI safety across domains: AI evaluations and measurement methodology, AI control, scalable oversight, multi-agent alignment, model organisms of misalignment, mechanistic interpretability.


  1. Name inspired by Emmy Noether (1882–1935), whose foundational work connecting symmetries to invariants underlies the framing of evaluation-invariant measurement. ↩︎