RoPE and the Long-Context Stack: Rotation, Interpolation, and What Breaks at 128k
RoPE turns position into a per-dimension rotation — and that same rotation is why PI, NTK-aware scaling, and YaRN exist, and why a 128k window rarely means 128k of usable context. The math, the methods, and the serving bill.