<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Frontier Checkpoint</title><description>A practitioner-only technical publication for working ML and agent engineers. We verify, reproduce, and recreate the work — exploring the open frontier of what actually holds up.</description><link>https://frontiercheckpoint.com/</link><language>en-us</language><copyright>© 2026 Frontier Checkpoint</copyright><item><title>What a 4B Model Can Actually Do: Field Notes from 155 Experiments</title><link>https://frontiercheckpoint.com/essays/what-a-4b-model-can-do/</link><guid isPermaLink="true">https://frontiercheckpoint.com/essays/what-a-4b-model-can-do/</guid><description>Across 155 small-model experiments centered on Qwen 3.5 4B, the same thing kept working: give the model something executable it can check against the evidence it has, and it punches far above its benchmark weight. Here is the field guide — the levers that worked, how I know they&apos;re real, and the frontier they opened up.</description><pubDate>Sun, 28 Jun 2026 00:00:00 GMT</pubDate><category>Essays</category><category>reproducibility</category><category>evaluation</category><category>fine-tuning</category><category>methodology</category><category>llm</category><category>agents</category><author>editors@frontiercheckpoint.com</author></item><item><title>The Harness Is the Product: Why Agent Evals Are the Real Moat</title><link>https://frontiercheckpoint.com/essays/agent-harness-evals-moat/</link><guid isPermaLink="true">https://frontiercheckpoint.com/essays/agent-harness-evals-moat/</guid><description>Swapping the frontier model rarely moves your agent&apos;s success rate as much as fixing retries and context management — and the one thing competitors can&apos;t clone is your evaluation environment. A thesis on why agent evals, not weights, are where reproducible capability accrues.</description><pubDate>Sat, 27 Jun 2026 00:00:00 GMT</pubDate><category>Essays</category><category>agents</category><category>agent-harness</category><category>tool-use</category><category>evaluation</category><author>editors@frontiercheckpoint.com</author></item><item><title>vLLM and the new default shape of LLM serving</title><link>https://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/vllm-paged-attention-serving-standard/</guid><description>If you are still serving with naive static batching, the gap is not marginal — paged KV-cache and continuous batching change the throughput-per-GPU math, and most other stacks have copied the idea.</description><pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>serving</category><category>paged-attention</category><category>kv-cache</category><author>editors@frontiercheckpoint.com</author></item><item><title>FlashAttention-3: async, low-precision, Hopper-native</title><link>https://frontiercheckpoint.com/signals/flashattention-3-hopper/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/flashattention-3-hopper/</guid><description>The headline is hardware-specific: FA3 is a Hopper story (async copy/MMA overlap, FP8 paths). The portable lesson from the FA line is still the one that matters — attention is bandwidth-bound, and the win is in HBM traffic, not FLOPs.</description><pubDate>Wed, 24 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>flash-attention</category><category>kernels</category><category>gpu-memory</category><author>editors@frontiercheckpoint.com</author></item><item><title>The Economics of Thinking: Test-Time Compute as a Scaling Axis</title><link>https://frontiercheckpoint.com/essays/economics-of-test-time-compute/</link><guid isPermaLink="true">https://frontiercheckpoint.com/essays/economics-of-test-time-compute/</guid><description>Reasoning models turned inference into a per-request dial. This is an economic read on when spending FLOPs at test time actually buys accuracy, why it only pays where answers are cheap to verify, and what variable-cost inference does to latency budgets and capacity planning.</description><pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate><category>Essays</category><category>test-time-compute</category><category>reasoning</category><category>serving</category><category>industry</category><author>editors@frontiercheckpoint.com</author></item><item><title>Mamba and the selective-state-space line</title><link>https://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/mamba-ssm-attention-alternative/</guid><description>Worth understanding even if you ship transformers: SSMs change the asymptotics (linear in sequence length, constant state at inference) and the failure modes. The interesting deployments are hybrids, not pure-SSM.</description><pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>state-space-models</category><category>attention</category><category>long-context</category><author>editors@frontiercheckpoint.com</author></item><item><title>Reading a Model Release Like an Engineer: Weights, Licenses, System Cards, and Evals</title><link>https://frontiercheckpoint.com/explainers/reading-a-model-release/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/reading-a-model-release/</guid><description>The headline benchmark is the least durable thing in a model release. Here is how to read access, licenses, cards, eval protocols, and serving facts before you commit engineering to a number you cannot reproduce.</description><pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>evaluation</category><category>reproducibility</category><category>llm</category><category>industry</category><author>editors@frontiercheckpoint.com</author></item><item><title>SGLang and RadixAttention for prefix reuse</title><link>https://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/sglang-radixattention-prefix-cache/</guid><description>If your workload has heavy shared prefixes — system prompts, few-shot exemplars, agent scaffolds — automatic prefix caching is close to free latency. This is where serving for agents diverges from serving for chat.</description><pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>serving</category><category>kv-cache</category><category>agents</category><author>editors@frontiercheckpoint.com</author></item><item><title>Reproducing the nanoGPT Speedrun: What Actually Moves the Loss Curve</title><link>https://frontiercheckpoint.com/reproductions/reproducing-nanogpt-speedrun/</link><guid isPermaLink="true">https://frontiercheckpoint.com/reproductions/reproducing-nanogpt-speedrun/</guid><description>The nanoGPT speedrun is a rare, fully open optimization target: hit 3.28 FineWeb validation loss on a GPT-2 (124M)-class model in minimum wall-clock on 8×H100. We reproduce the pipeline, isolate what the Muon optimizer and the architecture changes actually buy, and flag what will not transfer off the bench.</description><pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate><category>Reproductions</category><category>reproducibility</category><category>pretraining</category><category>optimization</category><category>distributed-training</category><author>editors@frontiercheckpoint.com</author></item><item><title>DeepSeek-R1: RL-trained reasoning with open weights</title><link>https://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/deepseek-r1-rl-reasoning-open-weights/</guid><description>The reproducible part is the method, not a leaderboard cell: group-relative RL on verifiable rewards, with open weights to probe. It is the cleanest public artifact for understanding the reasoning-model training loop.</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>reasoning</category><category>grpo</category><category>rlhf</category><author>editors@frontiercheckpoint.com</author></item><item><title>The modded-nanogpt speedrun and the Muon optimizer</title><link>https://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/</link><guid isPermaLink="true">https://frontiercheckpoint.com/signals/modded-nanogpt-muon-speedrun/</guid><description>A rare fully-public optimization target with a reproducible harness — exactly the kind of artifact we like. The Muon optimizer it popularized is the most interesting practical idea to come out of it.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>Signals</category><category>pretraining</category><category>optimization</category><category>reproducibility</category><author>editors@frontiercheckpoint.com</author></item><item><title>TRL in Anger: SFT, DPO, and GRPO Without Rewriting Your Training Loop</title><link>https://frontiercheckpoint.com/libraries/trl-sft-dpo-grpo-library/</link><guid isPermaLink="true">https://frontiercheckpoint.com/libraries/trl-sft-dpo-grpo-library/</guid><description>TRL turns SFT, DPO, and GRPO into Trainer subclasses that inherit the entire Hugging Face stack — accelerate, peft, DeepSpeed. The convenience is real; the cost is that you&apos;re debugging someone else&apos;s training loop the moment your problem stops looking like the quickstart.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>Libraries</category><category>dpo</category><category>grpo</category><category>peft</category><category>fine-tuning</category><category>rlhf</category><author>editors@frontiercheckpoint.com</author></item><item><title>Post-Training Quantization in Practice: GPTQ, AWQ, and FP8</title><link>https://frontiercheckpoint.com/explainers/post-training-quantization-gptq-awq-fp8/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/post-training-quantization-gptq-awq-fp8/</guid><description>Post-training quantization is the cheapest inference lever and the easiest to pull wrong. The right method is set by your serving regime — bandwidth-bound decode wants weight-only INT4, compute-bound prefill wants FP8 — and the win is real only if a fast kernel accelerates your exact config.</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>quantization</category><category>serving</category><category>kv-cache</category><author>editors@frontiercheckpoint.com</author></item><item><title>GRPO, Demystified: Group-Relative Policy Optimization for Reasoning Models</title><link>https://frontiercheckpoint.com/explainers/grpo-group-relative-policy-optimization/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/grpo-group-relative-policy-optimization/</guid><description>GRPO swaps PPO&apos;s learned critic for a Monte-Carlo baseline — the mean reward over a group of sampled completions — trading rollout compute and per-token credit assignment for a simpler, more stable RL loop on verifiable-reward tasks.</description><pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>grpo</category><category>rlhf</category><category>ppo</category><category>reasoning</category><category>fine-tuning</category><author>editors@frontiercheckpoint.com</author></item><item><title>Routing Is the Hard Part: A Practitioner&apos;s Guide to Mixture-of-Experts</title><link>https://frontiercheckpoint.com/explainers/moe-routing-practitioners-guide/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/moe-routing-practitioners-guide/</guid><description>MoE decouples parameter count from per-token FLOPs, but every hard problem — instability, dropped tokens, load imbalance, all-to-all traffic, a footprint set by total not active params — lives in the router. A structural tour from Switch/GShard to fine-grained and aux-loss-free designs, and the systems bill you actually pay.</description><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>mixture-of-experts</category><category>transformers</category><category>tensor-parallelism</category><author>editors@frontiercheckpoint.com</author></item><item><title>Recreating FlashAttention: A Tiled, IO-Aware Attention Kernel from Scratch</title><link>https://frontiercheckpoint.com/recreations/recreating-flashattention-tiled-kernel/</link><guid isPermaLink="true">https://frontiercheckpoint.com/recreations/recreating-flashattention-tiled-kernel/</guid><description>FlashAttention is exact attention restructured for the memory hierarchy, not an approximation. We implement the tiled forward and recompute backward in Triton, validate exactness against a reference, and separate what a tutorial actually reproduces from what needs CUTLASS-grade engineering.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>Recreations</category><category>flash-attention</category><category>kernels</category><category>attention</category><category>gpu-memory</category><author>editors@frontiercheckpoint.com</author></item><item><title>RoPE and the Long-Context Stack: Rotation, Interpolation, and What Breaks at 128k</title><link>https://frontiercheckpoint.com/explainers/rope-long-context-stack/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/rope-long-context-stack/</guid><description>RoPE turns position into a per-dimension rotation — and that same rotation is why PI, NTK-aware scaling, and YaRN exist, and why a 128k window rarely means 128k of usable context. The math, the methods, and the serving bill.</description><pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>rope</category><category>long-context</category><category>attention</category><category>transformers</category><author>editors@frontiercheckpoint.com</author></item><item><title>vLLM, Explained: PagedAttention, Continuous Batching, and the Serving Stack</title><link>https://frontiercheckpoint.com/libraries/vllm-paged-attention-serving/</link><guid isPermaLink="true">https://frontiercheckpoint.com/libraries/vllm-paged-attention-serving/</guid><description>vLLM treats the KV cache like OS virtual memory — non-contiguous paged blocks — and schedules work at the token, not the request. You get high aggregate throughput; the cost is that per-request latency becomes something you tune rather than something you get for free.</description><pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate><category>Libraries</category><category>paged-attention</category><category>kv-cache</category><category>serving</category><category>llm</category><author>editors@frontiercheckpoint.com</author></item><item><title>Sharding the Model: FSDP, ZeRO, and Tensor/Pipeline Parallelism</title><link>https://frontiercheckpoint.com/explainers/distributed-training-fsdp-zero-parallelism/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/distributed-training-fsdp-zero-parallelism/</guid><description>Past one GPU you stop training a model and start operating a distributed system. Here is what each parallelism axis actually shards, what it costs on the wire, and how practitioners stack them into 3D/4D layouts.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>distributed-training</category><category>fsdp</category><category>tensor-parallelism</category><category>gpu-memory</category><author>editors@frontiercheckpoint.com</author></item><item><title>How We Separate Signal From Noise: Frontier Checkpoint&apos;s Verification Rubric</title><link>https://frontiercheckpoint.com/essays/the-checkpoint-signal-vs-noise/</link><guid isPermaLink="true">https://frontiercheckpoint.com/essays/the-checkpoint-signal-vs-noise/</guid><description>The standard behind everything we publish: the filters that decide what earns your attention, the reproduced-to-unverified ladder we grade claims on, how we handle benchmarks and weightless releases, and why every correction is dated and logged rather than silently edited.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>Essays</category><category>methodology</category><category>reproducibility</category><category>evaluation</category><category>industry</category><author>editors@frontiercheckpoint.com</author></item></channel></rss>