Tools for using LLMs

Model development, training, eval

Evaluating a machine learning model.

Eval frameworks

Project Eureka | Project Eureka

Quantifying evals

[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

A statistical approach to model evaluations \ Anthropic

Eval tooling

Introducing Pytest and Vitest integrations for LangSmith Evaluations

Building apps that use LLMs

PySpur-Dev/pyspur: AI Agent Builder in Python

Hugging Face Agents Course 2025

Building effective agents \ Anthropic

ZML - High performance Inference.

zml/zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Deploy vLLM One-Click App - Koyeb

Optimizing LLM-centric ML deployment pipelines (infrastructure for model deployment = MLOps)

Infrastructure for ML, AI, and Data Science | Outerbounds

Metaflow Now Supports All the Major Clouds: Google Cloud, Azure, and AWS | Outerbounds

CUDA

CUDA C++ Programming Guide

Contents — PTX ISA 8.7 documentation