All products
Evaluation
Agent Evaluation Checklist
A free, printable checklist for testing agents before they ship.
Free Free Evaluation
About this product
A one-page checklist that walks you through building a golden set, writing a grading rubric, and catching regressions before they reach production. The fastest way to move from "it seemed fine" to evidence.
What's inside — free
Agent Evaluation Checklist
Print this. Run it before any agent does real work. The goal is to move from "it seemed fine" to evidence.
Build a golden set
- [ ] Collected 15-20 real inputs (not synthetic).
- [ ] Wrote the output you'd be happy with for each.
- [ ] Stored them somewhere you'll actually re-run.
Write a rubric
- [ ] One paragraph per task describing a "good" answer.
- [ ] Defined what counts as a hard fail.
Measure three things
- [ ] Task success — did it accomplish the goal? (human yes/no is fine to start)
- [ ] Tool correctness — right tool, sane arguments?
- [ ] Cost & latency — tokens and seconds per run, tracked over time.
Run on every change
- [ ] Baseline the current agent on the golden set.
- [ ] Re-run before AND after every prompt or model change.
- [ ] Reject changes that fix one case but break others.
Before go-live
- [ ] Guardrails tested (it refuses what it must).
- [ ] A human approval gate on anything irreversible.
- [ ] A kill switch you can hit without a deploy.
Related
More Evaluation
Paired agents
Pair this with a ready-to-use agent
FeaturedPremium
Marketing
Campaign Strategist
Plan and brief multi-channel marketing campaigns.
- Audience research
- Channel briefs
- Positioning
View agent
Featured pick from the team