Research interests
Technical AI safety and alignment, broadly:
- AI evaluations and measurement methodology, mechanistic interpretability, AI control, scalable oversight, multi-agent safety, model organisms of misalignment.
Independent research program:
- Measurement foundations of multi-agent AI systems: evaluation-invariant observables, perturbation–response analysis of collective fragility, and lengthening recovery times as early warning of catastrophic transitions. Methodological orientation drawn from physics, systems biology, and statistical mechanics, informing first-principles measurement theory to complement benchmark-based evaluations.
Active projects
emmy: Evaluation-invariant measurements for alignment of multi-agent systems
(Name inspired by Emmy Noether (1882–1935), whose foundational work connecting symmetries to invariants underlies the framing of evaluation-invariant measurement.)
- code: msyvr/emmy
activation tomography: Reconstruction methods for model activations
liar liar: To err is human: Is the model scheming to deceive… or is it just wrong?
- code: msyvr/pants-on-fire-eval