Topic

Tool Use

Function calling and tool-augmented models.

1 checkpoint

CHECKPOINT 00132026-06-27ESSAYS

The Harness Is the Product: Why Agent Evals Are the Real Moat

Swapping the frontier model rarely moves your agent's success rate as much as fixing retries and context management — and the one thing competitors can't clone is your evaluation environment. A thesis on why agent evals, not weights, are where reproducible capability accrues.