The complete logbook

Archive

Every checkpoint, in reverse chronological order. 20 filed and counting.

2026

2026-06-28EssaysWhat a 4B Model Can Actually Do: Field Notes from 155 Experiments0014
2026-06-27EssaysThe Harness Is the Product: Why Agent Evals Are the Real Moat0013
2026-06-26SignalsvLLM and the new default shape of LLM serving
2026-06-24SignalsFlashAttention-3: async, low-precision, Hopper-native
2026-06-23EssaysThe Economics of Thinking: Test-Time Compute as a Scaling Axis0012
2026-06-20SignalsMamba and the selective-state-space line
2026-06-19ExplainersReading a Model Release Like an Engineer: Weights, Licenses, System Cards, and Evals0011
2026-06-17SignalsSGLang and RadixAttention for prefix reuse
2026-06-15ReproductionsReproducing the nanoGPT Speedrun: What Actually Moves the Loss Curve0010
2026-06-13SignalsDeepSeek-R1: RL-trained reasoning with open weights
2026-06-11SignalsThe modded-nanogpt speedrun and the Muon optimizer
2026-06-11LibrariesTRL in Anger: SFT, DPO, and GRPO Without Rewriting Your Training Loop0009
2026-06-08ExplainersPost-Training Quantization in Practice: GPTQ, AWQ, and FP80008
2026-06-04ExplainersGRPO, Demystified: Group-Relative Policy Optimization for Reasoning Models0007
2026-06-01ExplainersRouting Is the Hard Part: A Practitioner's Guide to Mixture-of-Experts0006
2026-05-28RecreationsRecreating FlashAttention: A Tiled, IO-Aware Attention Kernel from Scratch0005
2026-05-24ExplainersRoPE and the Long-Context Stack: Rotation, Interpolation, and What Breaks at 128k0004
2026-05-20LibrariesvLLM, Explained: PagedAttention, Continuous Batching, and the Serving Stack0003
2026-05-15ExplainersSharding the Model: FSDP, ZeRO, and Tensor/Pipeline Parallelism0002
2026-05-12EssaysHow We Separate Signal From Noise: Frontier Checkpoint's Verification Rubric0001