Machine learning resources
🌱 notes 🌱
Where to begin?
Just starting out in machine learning? Know your way around but want to dig in deeper? Either way, Vicky Boykis has a terrific resource for you.
Her Anti-hype LLM reading list gist starts with a timeline sketch of 1990s statistical learning through today’s LLMs, then provides curated resources for topics including LLM building blocks, deep learning, transformers/attention, GPT, open source models, training data, pre-training, RLHF and DPO, fine-tuning and compression, small LLMs, GPUs, evaluation, and UX.
It’s a fine place to get a lay of the land and then get cosy with ML fundamentals.
Vicky also shares her ML learning notes: Machine Learning Garden
Academic courses
Some well-known machine learning courses offered at universities such as Stanford, CMU, Harvard, MIT, etc, post their materials. Links to just a few of those:
Fundamentals - mixed bag
On Langevin Dynamics in Machine Learning - Michael I. Jordan - YouTube
On Learning Aware Mechanism Design - Michael Jordan (Berkeley)
Reading lists
- Ilya 30u30
- Reading List For Andrej Karpathy’s “Intro to Large Language Models” Video | Oxen.ai
Information theory & ML
Revisiting thermodynamics in computation and information theory
Neural networks
This is an extremely non-comprehensive collection of links to either foundational papers or detailed explanations of foundational concepts.
Explaining the effectiveness of deep learning
[2410.21869] Cross-Entropy Is All You Need To Invert the Data Generating Process
Entropy is all you need? The quest for best tokens & the new physics of LLMs. - YouTube
Embeddings
Activation functions
An Overview of Activation Functions | Papers With Code
Expanded Gating Ranges Improve Activation Functions
SwiGLU Explained | Papers With Code
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs
Reward functions
‘forgetting’ parameter: The ants and the pheromones | the morning paper
Loss functions
Understanding Emergent Abilities of Language Models from the Loss Perspective
Backprop
The paper that started it all: Letters to Nature: Learning representation by back-propagating errors (pdf)
Convolutional neural nets
ImageNet Classification with Deep Convolutional Neural Networks
LLMs
The Platonic Representation Hypothesis
Transformers
Understanding the attention mechanism in sequence models
Understanding the Transformer architecture for neural networks
Transformers from Scratch - Brandon Rohrer
A Mathematical Framework for Transformer Circuits
Vision Transformers vs CNNs at the Edge
[2411.13676] Hymba: A Hybrid-head Architecture for Small Language Models
Beyond transformers
Specialized Foundation Models Struggle to Beat Supervised Baselines
Synthesis:
- Index of /asolar/SynthesisCourse
- 2405.06399v1.pdf
- [discussion] discrete search in program synthesis. Need information : r/MachineLearning
- YijiaWang.pdf
- links2 - Talk_LINKS_program_synthesis.pdf
- Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus
- 1558_synthetic_datasets_for_neural_.pdf
- Alford-salford-meng-eecs-2021-thesis.pdf
Training
A Recipe for Training Neural Networks
Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case - YouTube
[2410.17413] Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scaling laws -
-
[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
-
Limits on scaling models -
Sources for training data -
- web (commoncrawl*, existing filtered web datasets eg [FineWeb (15T tokens)](FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW)), github (bigcode (BigCode): bigcode/the-stack-v2 · Datasets at Hugging Face)
- curated sources: wikipedia, arxiv, stackexchange
- [synthetic data]([2306.11644] Textbooks Are All You Need)
*CommonCrawl process:
- cc is a public repo of crawled web pages
- needs to be filtered, at scale (95 dumps, ~425 TiB in the latest dump)
Diffusion models
[2402.04384] Denoising Diffusion Probabilistic Models in Six Simple Steps
Interrogating model behaviors
Understanding Image Classifiers at the Dataset Level with Diffusion Models
Model evaluation
On the Measure of Intelligence - Chollet
Evaluating a machine learning model.
Quantifying evals
[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
A statistical approach to model evaluations \ Anthropic
Fine tuning
Fine tuning refers to the process of starting with a generally-trained model, such as Gemini, Llama, etc, then training on a domain-specific dataset to produce better quality answers to general questions posed in that domain.
Fine tuning tools
Unsloth: more efficient fine tuning
Fine tuning considerations
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
GPUs/TPUs
Understanding GPU Memory 1: Visualizing All Allocations over Time | PyTorch
Robotics
How to Train Your Robot (Book) - Brandon Rohrer
ML playgrounds
Neural Network Concepts Animations
🔥 Kaggle's 5-Day Gen AI Intensive Course | Kaggle
LLM prompting
What's the Magic Word? A Control Theory of LLM Prompting
Neural operators
Neural operators for accelerating scientific simulations and design | Nature Reviews Physics
Machine unlearning
Announcing the first Machine Unlearning Challenge