<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Signals — Frontier Checkpoint</title><description>Short, high-value dispatches from the edge — papers, releases, kernels, benchmarks — each with a clear, friendly read on what&apos;s genuinely new and why it matters for the code you write. A quick way to stay current and actually understand what you&apos;re seeing.</description><link>https://frontiercheckpoint.com/</link><item><title>vLLM and the new default shape of LLM serving</title><link>https://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/</guid><description>If you are still serving with naive static batching, the gap is not marginal — paged KV-cache and continuous batching change the throughput-per-GPU math, and most other stacks have copied the idea.</description><pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>serving</category><category>paged-attention</category><category>kv-cache</category><author>editors@frontiercheckpoint.com</author></item><item><title>FlashAttention-3: async, low-precision, Hopper-native</title><link>https://frontiercheckpoint.com/signals/flashattention-3-hopper/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/flashattention-3-hopper/</guid><description>The headline is hardware-specific: FA3 is a Hopper story (async copy/MMA overlap, FP8 paths). The portable lesson from the FA line is still the one that matters — attention is bandwidth-bound, and the win is in HBM traffic, not FLOPs.</description><pubDate>Wed, 24 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>flash-attention</category><category>kernels</category><category>gpu-memory</category><author>editors@frontiercheckpoint.com</author></item><item><title>Mamba and the selective-state-space line</title><link>https://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/</guid><description>Worth understanding even if you ship transformers: SSMs change the asymptotics (linear in sequence length, constant state at inference) and the failure modes. The interesting deployments are hybrids, not pure-SSM.</description><pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>state-space-models</category><category>attention</category><category>long-context</category><author>editors@frontiercheckpoint.com</author></item><item><title>SGLang and RadixAttention for prefix reuse</title><link>https://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/</guid><description>If your workload has heavy shared prefixes — system prompts, few-shot exemplars, agent scaffolds — automatic prefix caching is close to free latency. This is where serving for agents diverges from serving for chat.</description><pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>serving</category><category>kv-cache</category><category>agents</category><author>editors@frontiercheckpoint.com</author></item><item><title>DeepSeek-R1: RL-trained reasoning with open weights</title><link>https://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/</guid><description>The reproducible part is the method, not a leaderboard cell: group-relative RL on verifiable rewards, with open weights to probe. It is the cleanest public artifact for understanding the reasoning-model training loop.</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>reasoning</category><category>grpo</category><category>rlhf</category><author>editors@frontiercheckpoint.com</author></item><item><title>The modded-nanogpt speedrun and the Muon optimizer</title><link>https://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/</guid><description>A rare fully-public optimization target with a reproducible harness — exactly the kind of artifact we like. The Muon optimizer it popularized is the most interesting practical idea to come out of it.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>pretraining</category><category>optimization</category><category>reproducibility</category><author>editors@frontiercheckpoint.com</author></item></channel></rss>