Contents

Machine learning resources

🌱 notes 🌱

Where to begin?

Just starting out in machine learning? Know your way around but want to dig in deeper? Either way, Vicky Boykis has a terrific resource for you.

Her Anti-hype LLM reading list gist starts with a timeline sketch of 1990s statistical learning through today’s LLMs, then provides curated resources for topics including LLM building blocks, deep learning, transformers/attention, GPT, open source models, training data, pre-training, RLHF and DPO, fine-tuning and compression, small LLMs, GPUs, evaluation, and UX.

It’s a fine place to get a lay of the land and then get cosy with ML fundamentals.

Vicky also shares her ML learning notes: Machine Learning Garden

Academic courses

Some well-known machine learning courses offered at universities such as Stanford, CMU, Harvard, MIT, etc, post their materials. Links to just a few of those:

Fundamentals - mixed bag

On Langevin Dynamics in Machine Learning - Michael I. Jordan - YouTube

On Learning Aware Mechanism Design - Michael Jordan (Berkeley)

Reading lists


Information theory & ML

Revisiting thermodynamics in computation and information theory

Neural networks

This is an extremely non-comprehensive collection of links to either foundational papers or detailed explanations of foundational concepts.

Explaining the effectiveness of deep learning

[2410.21869] Cross-Entropy Is All You Need To Invert the Data Generating Process

Entropy is all you need? The quest for best tokens & the new physics of LLMs. - YouTube

Embeddings

What are embeddings?

Activation functions

An Overview of Activation Functions | Papers With Code

Expanded Gating Ranges Improve Activation Functions

SwiGLU Explained | Papers With Code

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

Reward functions

‘forgetting’ parameter: The ants and the pheromones | the morning paper

Loss functions

Understanding Emergent Abilities of Language Models from the Loss Perspective

Backprop

The paper that started it all: Letters to Nature: Learning representation by back-propagating errors (pdf)

How to guess a gradient

Convolutional neural nets

ImageNet Classification with Deep Convolutional Neural Networks

LLMs

The Platonic Representation Hypothesis

Transformers

Attention Is All You Need

Understanding the attention mechanism in sequence models

Understanding the Transformer architecture for neural networks

Transformers from Scratch - Brandon Rohrer

A Mathematical Framework for Transformer Circuits

Vision Transformers vs CNNs at the Edge

[2411.13676] Hymba: A Hybrid-head Architecture for Small Language Models

Beyond transformers

Specialized Foundation Models Struggle to Beat Supervised Baselines

Synthesis:

Training

A Recipe for Training Neural Networks

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case - YouTube

[2410.17413] Scalable Influence and Fact Tracing for Large Language Model Pretraining

Scaling laws -

Sources for training data -

*CommonCrawl process:

  • cc is a public repo of crawled web pages
  • needs to be filtered, at scale (95 dumps, ~425 TiB in the latest dump)

Diffusion models

[2402.04384] Denoising Diffusion Probabilistic Models in Six Simple Steps

Interrogating model behaviors

[2411.00247] Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

Understanding Image Classifiers at the Dataset Level with Diffusion Models

Model evaluation

On the Measure of Intelligence - Chollet

Evaluating a machine learning model.

Inspect

Quantifying evals

[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

A statistical approach to model evaluations \ Anthropic

Fine tuning

Fine tuning refers to the process of starting with a generally-trained model, such as Gemini, Llama, etc, then training on a domain-specific dataset to produce better quality answers to general questions posed in that domain.

Fine tuning tools

Unsloth: more efficient fine tuning

Fine tuning considerations

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

GPUs/TPUs

Understanding GPU Memory 1: Visualizing All Allocations over Time | PyTorch

Robotics

How to Train Your Robot (Book) - Brandon Rohrer

ML playgrounds

AI Test Kitchen

Neural Network Concepts Animations

🔥 Kaggle's 5-Day Gen AI Intensive Course | Kaggle

LLM prompting

What's the Magic Word? A Control Theory of LLM Prompting

Neural operators

Neural operators for accelerating scientific simulations and design | Nature Reviews Physics

Machine unlearning

Announcing the first Machine Unlearning Challenge

Blogs

Laura’s AI research blog | Blog about AI research.