<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Explainers — Frontier Checkpoint</title><description>Evergreen, deeply taught technique walkthroughs. Mechanistic, code-aware, and generous with context — written to go deep enough that you could reimplement the idea yourself, not just nod along. The kind of piece you might bookmark and come back to when you want a thing to truly click.</description><link>https://frontiercheckpoint.com/</link><item><title>Reading a Model Release Like an Engineer: Weights, Licenses, System Cards, and Evals</title><link>https://frontiercheckpoint.com/explainers/reading-a-model-release/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/reading-a-model-release/</guid><description>The headline benchmark is the least durable thing in a model release. Here is how to read access, licenses, cards, eval protocols, and serving facts before you commit engineering to a number you cannot reproduce.</description><pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>evaluation</category><category>reproducibility</category><category>llm</category><category>industry</category><author>editors@frontiercheckpoint.com</author></item><item><title>Post-Training Quantization in Practice: GPTQ, AWQ, and FP8</title><link>https://frontiercheckpoint.com/explainers/post-training-quantization-gptq-awq-fp8/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/post-training-quantization-gptq-awq-fp8/</guid><description>Post-training quantization is the cheapest inference lever and the easiest to pull wrong. The right method is set by your serving regime — bandwidth-bound decode wants weight-only INT4, compute-bound prefill wants FP8 — and the win is real only if a fast kernel accelerates your exact config.</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>quantization</category><category>serving</category><category>kv-cache</category><author>editors@frontiercheckpoint.com</author></item><item><title>GRPO, Demystified: Group-Relative Policy Optimization for Reasoning Models</title><link>https://frontiercheckpoint.com/explainers/grpo-group-relative-policy-optimization/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/grpo-group-relative-policy-optimization/</guid><description>GRPO swaps PPO&apos;s learned critic for a Monte-Carlo baseline — the mean reward over a group of sampled completions — trading rollout compute and per-token credit assignment for a simpler, more stable RL loop on verifiable-reward tasks.</description><pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>grpo</category><category>rlhf</category><category>ppo</category><category>reasoning</category><category>fine-tuning</category><author>editors@frontiercheckpoint.com</author></item><item><title>Routing Is the Hard Part: A Practitioner&apos;s Guide to Mixture-of-Experts</title><link>https://frontiercheckpoint.com/explainers/moe-routing-practitioners-guide/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/moe-routing-practitioners-guide/</guid><description>MoE decouples parameter count from per-token FLOPs, but every hard problem — instability, dropped tokens, load imbalance, all-to-all traffic, a footprint set by total not active params — lives in the router. A structural tour from Switch/GShard to fine-grained and aux-loss-free designs, and the systems bill you actually pay.</description><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>mixture-of-experts</category><category>transformers</category><category>tensor-parallelism</category><author>editors@frontiercheckpoint.com</author></item><item><title>RoPE and the Long-Context Stack: Rotation, Interpolation, and What Breaks at 128k</title><link>https://frontiercheckpoint.com/explainers/rope-long-context-stack/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/rope-long-context-stack/</guid><description>RoPE turns position into a per-dimension rotation — and that same rotation is why PI, NTK-aware scaling, and YaRN exist, and why a 128k window rarely means 128k of usable context. The math, the methods, and the serving bill.</description><pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>rope</category><category>long-context</category><category>attention</category><category>transformers</category><author>editors@frontiercheckpoint.com</author></item><item><title>Sharding the Model: FSDP, ZeRO, and Tensor/Pipeline Parallelism</title><link>https://frontiercheckpoint.com/explainers/distributed-training-fsdp-zero-parallelism/</link><guid isPermaLink="true">https://frontiercheckpoint.com/explainers/distributed-training-fsdp-zero-parallelism/</guid><description>Past one GPU you stop training a model and start operating a distributed system. Here is what each parallelism axis actually shards, what it costs on the wire, and how practitioners stack them into 3D/4D layouts.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><category>Explainers</category><category>distributed-training</category><category>fsdp</category><category>tensor-parallelism</category><category>gpu-memory</category><author>editors@frontiercheckpoint.com</author></item></channel></rss>