R1 is a useful anchor because it pairs a strong claim with an unusual amount of openness — weights plus a recipe centered on GRPO and rule-verifiable rewards. That makes it checkable in a way most frontier reasoning systems are not. Start with our GRPO explainer for the algorithm, then read the report for the training details.