Reading paths
Series
Cross-cutting deep dives that build on each other. Pick a thread and follow it from first principles to the kernel.
Building a Training Stack from Scratch
From a single GPU to a sharded cluster: the kernels, parallelism, and reproductions behind a modern pretraining run — built up one load-bearing piece at a time.
The Inference Stack
What actually happens between a request and a token: paged KV-cache, continuous batching, quantization, and the serving machinery that makes LLMs cheap enough to ship.
RL for Reasoning Models
The policy-gradient lineage behind reasoning models — from PPO and RLHF to DPO and GRPO — with the math, the failure modes, and the libraries that implement it.
The 4B Frontier
Field notes from pushing a single small model — Qwen 3.5 4B — at hard structured tasks: what reproduces, what doesn't, and what the failures teach about selection, structured execution, and posttraining.