Signals — Frontier Checkpoint

Signals — Frontier CheckpointShort, high-value dispatches from the edge — papers, releases, kernels, benchmarks — each with a clear, friendly read on what's genuinely new and why it matters for the code you write. A quick way to stay current and actually understand what you're seeing.https://frontiercheckpoint.com/vLLM and the new default shape of LLM servinghttps://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/https://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/If you are still serving with naive static batching, the gap is not marginal — paged KV-cache and continuous batching change the throughput-per-GPU math, and most other stacks have copied the idea.Fri, 26 Jun 2026 00:00:00 GMTSignalsservingpaged-attentionkv-cacheeditors@frontiercheckpoint.comFlashAttention-3: async, low-precision, Hopper-nativehttps://frontiercheckpoint.com/signals/flashattention-3-hopper/https://frontiercheckpoint.com/signals/flashattention-3-hopper/The headline is hardware-specific: FA3 is a Hopper story (async copy/MMA overlap, FP8 paths). The portable lesson from the FA line is still the one that matters — attention is bandwidth-bound, and the win is in HBM traffic, not FLOPs.Wed, 24 Jun 2026 00:00:00 GMTSignalsflash-attentionkernelsgpu-memoryeditors@frontiercheckpoint.comMamba and the selective-state-space linehttps://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/https://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/Worth understanding even if you ship transformers: SSMs change the asymptotics (linear in sequence length, constant state at inference) and the failure modes. The interesting deployments are hybrids, not pure-SSM.Sat, 20 Jun 2026 00:00:00 GMTSignalsstate-space-modelsattentionlong-contexteditors@frontiercheckpoint.comSGLang and RadixAttention for prefix reusehttps://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/https://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/If your workload has heavy shared prefixes — system prompts, few-shot exemplars, agent scaffolds — automatic prefix caching is close to free latency. This is where serving for agents diverges from serving for chat.Wed, 17 Jun 2026 00:00:00 GMTSignalsservingkv-cacheagentseditors@frontiercheckpoint.comDeepSeek-R1: RL-trained reasoning with open weightshttps://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/https://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/The reproducible part is the method, not a leaderboard cell: group-relative RL on verifiable rewards, with open weights to probe. It is the cleanest public artifact for understanding the reasoning-model training loop.Sat, 13 Jun 2026 00:00:00 GMTSignalsreasoninggrporlhfeditors@frontiercheckpoint.comThe modded-nanogpt speedrun and the Muon optimizerhttps://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/https://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/A rare fully-public optimization target with a reproducible harness — exactly the kind of artifact we like. The Muon optimizer it popularized is the most interesting practical idea to come out of it.Thu, 11 Jun 2026 00:00:00 GMTSignalspretrainingoptimizationreproducibilityeditors@frontiercheckpoint.com