SIGNAL · SIGNALS

FlashAttention-3: async, low-precision, Hopper-native

The headline is hardware-specific: FA3 is a Hopper story (async copy/MMA overlap, FP8 paths). The portable lesson from the FA line is still the one that matters — attention is bandwidth-bound, and the win is in HBM traffic, not FLOPs.

2026-06-241 MINBY Frontier Checkpoint Editorial

Source: arXiv:2407.08608 ↗paper · notable

FlashAttention has always been an exercise in respecting the memory hierarchy: tile the computation, keep the softmax online, never materialize the full attention matrix in HBM. FA3 adds Hopper-specific machinery — overlapping async memory movement with tensor-core math, and low-precision paths — to close more of the gap to peak. We rebuilt the core idea from scratch in our recreation.