PagedAttention treats the KV-cache like virtual memory: fixed-size blocks, near-zero fragmentation, and sharing across sequences. Combined with continuous (in-flight) batching, it keeps the GPU saturated instead of waiting on the slowest sequence in a batch. The reason to care is not novelty — it is that the technique has become the assumed baseline, and SGLang, TGI, and TensorRT-LLM all ship their own versions. Our explainer walks the internals.