Agent and structured-generation workloads send the same long prefix over and over. RadixAttention indexes KV-cache blocks in a radix tree so identical prefixes are computed once and shared, rather than recomputed per request. It is a concrete example of a broader theme we keep returning to: the harness and serving layer, not just the model, is where agent performance is won.