Topic

Quantization

Low-bit weight and activation formats: GPTQ, AWQ, FP8.

1 checkpoint

CHECKPOINT 00082026-06-08EXPLAINERSintermediate

Post-Training Quantization in Practice: GPTQ, AWQ, and FP8

Post-training quantization is the cheapest inference lever and the easiest to pull wrong. The right method is set by your serving regime — bandwidth-bound decode wants weight-only INT4, compute-bound prefill wants FP8 — and the win is real only if a fast kernel accelerates your exact config.