Understanding Reports

Each eval report is a comprehensive analysis of how an AI agent experienced your developer tool.

Report sections

Every report gets a letter grade (A+ to F) and a numeric score (0-100). The score is a weighted average across multiple dimensions.

Reports are scored across several dimensions:

Each dimension has its own score, weight, and grade.

Specific moments where the agent struggled. Each friction point includes:

The full conversation log of the agent’s eval run. See exactly what the agent tried, what tools it called, and where it got stuck.

Prioritized list of improvements, ranked by effort (quick win, medium, or major effort).

Status	Meaning
Draft	Eval completed, visible to your org, not yet approved
Published	Approved and visible on the leaderboard
Archived	Hidden from default views, admin only

You can publish your own draft reports from the dashboard.