SIGNAL · SIGNALS

Mamba and the selective-state-space line

Worth understanding even if you ship transformers: SSMs change the asymptotics (linear in sequence length, constant state at inference) and the failure modes. The interesting deployments are hybrids, not pure-SSM.

2026-06-201 MINBY Frontier Checkpoint Editorial

Source: arXiv:2312.00752 ↗paper · notable

Mamba replaces attention’s pairwise mixing with an input-dependent state-space recurrence, computed with a parallel scan tuned to the GPU memory hierarchy. The practitioner-relevant consequence is a constant-size recurrent state at inference instead of a growing KV-cache — attractive for very long sequences — traded against weaker exact-recall behavior. Most production interest is in hybrid stacks that interleave SSM and attention blocks.