Reading paths

Series

Cross-cutting deep dives that build on each other. Pick a thread and follow it from first principles to the kernel.

Building a Training Stack from Scratch

From a single GPU to a sharded cluster: the kernels, parallelism, and reproductions behind a modern pretraining run — built up one load-bearing piece at a time.

ongoing2 / 4 parts

The Inference Stack

What actually happens between a request and a token: paged KV-cache, continuous batching, quantization, and the serving machinery that makes LLMs cheap enough to ship.

ongoing2 / 3 parts

RL for Reasoning Models

The policy-gradient lineage behind reasoning models — from PPO and RLHF to DPO and GRPO — with the math, the failure modes, and the libraries that implement it.

ongoing1 / 4 parts

Field notes from pushing a single small model — Qwen 3.5 4B — at hard structured tasks: what reproduces, what doesn't, and what the failures teach about selection, structured execution, and posttraining.

Building a Training Stack from Scratch

The Inference Stack

RL for Reasoning Models

The 4B Frontier