Mamba replaces attention’s pairwise mixing with an input-dependent state-space recurrence, computed with a parallel scan tuned to the GPU memory hierarchy. The practitioner-relevant consequence is a constant-size recurrent state at inference instead of a growing KV-cache — attractive for very long sequences — traded against weaker exact-recall behavior. Most production interest is in hybrid stacks that interleave SSM and attention blocks.