AI and the Everything in the Whole Wide World Benchmark
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead - Microsoft Research
Going beyond open data – increasing transparency and trust in language models with OLMoTrace | Ai2
Tracing the thoughts of a large language model \ Anthropic
Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero | PNAS
[2401.17173] Zero-Shot Reinforcement Learning via Function Encoders
MCP Protocol: a new AI dev tools building block
A Philosophy of Software Design - psd.pdf
Severe deviation in protein fold prediction by advanced AI: a case study | Scientific Reports
Building multi-source ingestion pipelines the right way
[1807.02811] A Tutorial on Bayesian Optimization
The Role Of AI Observability In 2025 | Honeycomb
[2411.08019] Language Models as Causal Effect Generators
[2504.03464] Spatiotemporal causal inference with arbitrary spillover and carryover effects
Show Your Work: Improved Reporting of Experimental Results - ACL Anthology
[2012.03826] HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation
Taking a responsible path to AGI - Google DeepMind
[2502.01706] Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction
[1412.6980] Adam: A Method for Stochastic Optimization
Learning with not Enough Data Part 1: Semi-Supervised Learning | Lil'Log
RAG vs. Fine-tuning and more | Google Cloud Blog
A real explanation of discrete Fourier transform
MLOps system design is boring. - by Alexandru Vesa
Faster Python calculations with Numba: 2 lines of code, 13× speed-up
Things that go wrong with disk IO | notes.eatonphil.com
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Dependency Injection for Artificial Intelligence (DI4AI)
[2503.05336v3] Toward an Evaluation Science for Generative AI Systems
Introduction | RLHF Book by Nathan Lambert
Is GOPRIVATE actually needed? : r/golang
Demystifying Chains, Trees, and Graphs of Thoughts
Sequential decision making - Kevin Murphy, DeepMind
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
International AI safety report - International_AI_Safety_Report_2025_accessible_f.pdf
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
A Little Bit of Reinforcement Learning from Human Feedback
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
A Mathematical Framework for Transformer Circuits
A Recipe for Training Neural Networks
Scaling and networking a modular photonic quantum computer | Nature
Large Language Diffusion Models
KindXiaoming/grow-crystals: Getting crystal-like representations with harmonic loss