Contents

Machine learning resources

🌱 notes 🌱

Where to begin?

Just starting out in machine learning? Know your way around but want to dig in deeper? Either way, Vicky Boykis has a terrific resource for you.

Her Anti-hype LLM reading list gist starts with a timeline sketch of 1990s statistical learning through today’s LLMs, then provides curated resources for topics including LLM building blocks, deep learning, transformers/attention, GPT, open source models, training data, pre-training, RLHF and DPO, fine-tuning and compression, small LLMs, GPUs, evaluation, and UX.

It’s a fine place to get a lay of the land and then get cosy with ML fundamentals.

Vicky also shares her ML learning notes: Machine Learning Garden

Academic courses

Some well-known machine learning courses offered at universities such as Stanford, CMU, Harvard, MIT, etc, post their materials. Links to just a few of those:

Books

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Fundamentals - mixed bag

On Langevin Dynamics in Machine Learning - Michael I. Jordan - YouTube

On Learning Aware Mechanism Design - Michael Jordan (Berkeley)

Reading lists


Information theory & ML

Revisiting thermodynamics in computation and information theory

Neural networks

This is an extremely non-comprehensive collection of links to either foundational papers or detailed explanations of foundational concepts.

Data verification

[2307.00682] Tools for Verifying Neural Models' Training Data

Explaining the effectiveness of deep learning

[2410.21869] Cross-Entropy Is All You Need To Invert the Data Generating Process

Entropy is all you need? The quest for best tokens & the new physics of LLMs. - YouTube

Embeddings

What are embeddings?

Activation functions

An Overview of Activation Functions | Papers With Code

Expanded Gating Ranges Improve Activation Functions

SwiGLU Explained | Papers With Code

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

Reward functions

‘forgetting’ parameter: The ants and the pheromones | the morning paper

Loss functions

Understanding Emergent Abilities of Language Models from the Loss Perspective

Backprop

The paper that started it all: Letters to Nature: Learning representation by back-propagating errors (pdf)

How to guess a gradient

Convolutional neural nets

ImageNet Classification with Deep Convolutional Neural Networks

LLMs

The Platonic Representation Hypothesis

Transformers

Attention Is All You Need

Understanding the attention mechanism in sequence models

Understanding the Transformer architecture for neural networks

Transformers from Scratch - Brandon Rohrer

A Mathematical Framework for Transformer Circuits

An Overview of Early Vision in InceptionV1

Vision Transformers vs CNNs at the Edge

[2411.13676] Hymba: A Hybrid-head Architecture for Small Language Models

Beyond transformers

Specialized Foundation Models Struggle to Beat Supervised Baselines

Synthesis:

Training

A Recipe for Training Neural Networks

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case - YouTube

[2410.17413] Scalable Influence and Fact Tracing for Large Language Model Pretraining

Scaling laws -

Sources for training data -

*CommonCrawl process:

  • cc is a public repo of crawled web pages
  • needs to be filtered, at scale (95 dumps, ~425 TiB in the latest dump)

Diffusion models

[2402.04384] Denoising Diffusion Probabilistic Models in Six Simple Steps

Interrogating model behaviors

[2411.00247] Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

Understanding Image Classifiers at the Dataset Level with Diffusion Models

Model evaluation

On the Measure of Intelligence - Chollet

Evaluating a machine learning model.

Inspect

The Fundamentals Of Designing Autonomy Evaluations For AI Safety | Lukas Petersson's blog

Quantifying evals

[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

A statistical approach to model evaluations \ Anthropic

Distillation

Essentially: Pretrain a small model and a large model on the same training data, then train the small model to mimic the large model. Small model outperforms large model.

[1503.02531] Distilling the Knowledge in a Neural Network

Fine tuning

Fine tuning refers to the process of starting with a generally-trained model, such as Gemini, Llama, etc, then training on a domain-specific dataset to produce better quality answers to general questions posed in that domain.

Using reinforcement learning to fine tune

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Fine tuning tools

Unsloth: more efficient fine tuning

Fine tuning considerations

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

GPUs/TPUs

Understanding GPU Memory 1: Visualizing All Allocations over Time | PyTorch

Robotics

How to Train Your Robot (Book) - Brandon Rohrer

ML playgrounds

AI Test Kitchen

Neural Network Concepts Animations

🔥 Kaggle's 5-Day Gen AI Intensive Course | Kaggle

LLM prompting

What's the Magic Word? A Control Theory of LLM Prompting

Neural operators

Neural operators for accelerating scientific simulations and design | Nature Reviews Physics

Machine unlearning

Announcing the first Machine Unlearning Challenge

Blogs

Laura’s AI research blog | Blog about AI research.

Lilian Weng - Machine learning blog