Understanding Reports
Each eval report is a comprehensive analysis of how an AI agent experienced your developer tool.Report sections
Grade & Score
Every report gets a letter grade (A+ to F) and a numeric score (0-100). The score is a weighted average across multiple dimensions.Dimensions
Reports are scored across several dimensions:- Documentation — quality, completeness, and findability of docs
- API Design — intuitiveness, consistency, and error messages
- Setup Flow — time and friction to go from zero to working
- Tool Integration — how well the product works with AI agents
- Error Recovery — how helpful error messages are for debugging
Friction Points
Specific moments where the agent struggled. Each friction point includes:- Severity — critical, major, or minor
- Description — what happened
- Context — what the agent was trying to do
Session Trace
The full conversation log of the agent’s eval run. See exactly what the agent tried, what tools it called, and where it got stuck.Recommendations
Prioritized list of improvements, ranked by effort (quick win, medium, or major effort).Report statuses
| Status | Meaning |
|---|---|
| Draft | Eval completed, visible to your org, not yet approved |
| Published | Approved and visible on the leaderboard |
| Archived | Hidden from default views, admin only |