Machine learning resources

2024-07-21 1533 words 8 minutes

Contents

🌱 Notes 🌱

… on learning about machine learning.

Where to begin?

Just starting out in machine learning? Know your way around but want to dig in deeper? Either way, Vicky Boykis has a terrific resource for you.

Her Anti-hype LLM reading list gist starts with a timeline sketch of 1990s statistical learning through today’s LLMs, then provides curated resources for topics including LLM building blocks, deep learning, transformers/attention, GPT, open source models, training data, pre-training, RLHF and DPO, fine-tuning and compression, small LLMs, GPUs, evaluation, and UX.

It’s a fine place to get a lay of the land and then get cosy with ML fundamentals.

Vicky Boykis’ Anti-hype LLM reading list

Vicky also shares her ML learning notes: Machine Learning Garden

Academic courses

Some well-known machine learning courses offered at universities such as Stanford, CMU, Harvard, MIT, etc, post their materials. Links to just a few of those:

Books

concise: The Little Book of Deep Learning

more substantial: Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Online books, broad-topic notes/tutorials

A Little Bit of Reinforcement Learning from Human Feedback

Fundamentals - mixed bag

On Langevin Dynamics in Machine Learning - Michael I. Jordan - YouTube

On Learning Aware Mechanism Design - Michael Jordan (Berkeley)

Reading lists

Information theory & ML

Revisiting thermodynamics in computation and information theory

Neural networks

This is an extremely non-comprehensive collection of links to either foundational papers or detailed explanations of foundational concepts.

Synthesis:

Training

A Recipe for Training Neural Networks

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case - YouTube

Secrets in Training a Large Language Model | by Yan Xu | Medium

[2410.17413] Scalable Influence and Fact Tracing for Large Language Model Pretraining

Scaling laws -

Sources for training data -

web (commoncrawl*, existing filtered web datasets eg [FineWeb (15T tokens)](FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW)), github (bigcode (BigCode): bigcode/the-stack-v2 · Datasets at Hugging Face)
curated sources: wikipedia, arxiv, stackexchange
[synthetic data]([2306.11644] Textbooks Are All You Need)

*CommonCrawl process:

cc is a public repo of crawled web pages
needs to be filtered, at scale (95 dumps, ~425 TiB in the latest dump)

Diffusion models

[2402.04384] Denoising Diffusion Probabilistic Models in Six Simple Steps

Interrogating model behaviors

[2411.00247] Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

Understanding Image Classifiers at the Dataset Level with Diffusion Models

Model evaluation

On the Measure of Intelligence - Chollet

Evaluating a machine learning model.

Inspect

The Fundamentals Of Designing Autonomy Evaluations For AI Safety | Lukas Petersson's blog

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects - 2505.18893v4.pdf

Do LLM coding benchmarks measure real-world utility? – Ehud Reiter's Blog

The way we evaluate AI model safety might be about to break

Introducing the WeirdML Benchmark — LessWrong

[2406.02061] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

thread re DeepSeek and Alice in Wonderland tasks: https://x.com/JJitsev/status/1883158738661691878

Show Your Work: Improved Reporting of Experimental Results - ACL Anthology

Eval frameworks

Project Eureka | Project Eureka

Quantifying evals

[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

A statistical approach to model evaluations \ Anthropic

Eval tooling

Introducing Pytest and Vitest integrations for LangSmith Evaluations

Distillation

Essentially: Pretrain a small model and a large model on the same training data, then train the small model to mimic the large model. Small model outperforms large model.

[1503.02531] Distilling the Knowledge in a Neural Network

Fine tuning

Fine tuning refers to the process of starting with a generally-trained model, such as Gemini, Llama, etc, then training on a domain-specific dataset to produce better quality answers to general questions posed in that domain.

It’s substantively different from RAG: [2312.05934] Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs