Model development, training, eval
Evaluating a machine learning model.
Eval frameworks
Project Eureka | Project Eureka
Quantifying evals
[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
A statistical approach to model evaluations \ Anthropic
Eval tooling
Introducing Pytest and Vitest integrations for LangSmith Evaluations
Building apps that use LLMs
PySpur-Dev/pyspur: AI Agent Builder in Python
Hugging Face Agents Course 2025
Building effective agents \ Anthropic
ZML - High performance Inference.
Deploy vLLM One-Click App - Koyeb
Optimizing LLM-centric ML deployment pipelines (infrastructure for model deployment = MLOps)
Infrastructure for ML, AI, and Data Science | Outerbounds