Skip to main content

Understanding Reports

Each eval report is a comprehensive analysis of how an AI agent experienced your developer tool.

Report sections

Grade & Score

Every report gets a letter grade (A+ to F) and a numeric score (0-100). The score is a weighted average across multiple dimensions.

Dimensions

Reports are scored across several dimensions:
  • Documentation — quality, completeness, and findability of docs
  • API Design — intuitiveness, consistency, and error messages
  • Setup Flow — time and friction to go from zero to working
  • Tool Integration — how well the product works with AI agents
  • Error Recovery — how helpful error messages are for debugging
Each dimension has its own score, weight, and grade.

Friction Points

Specific moments where the agent struggled. Each friction point includes:
  • Severity — critical, major, or minor
  • Description — what happened
  • Context — what the agent was trying to do

Session Trace

The full conversation log of the agent’s eval run. See exactly what the agent tried, what tools it called, and where it got stuck.

Recommendations

Prioritized list of improvements, ranked by effort (quick win, medium, or major effort).

Report statuses

StatusMeaning
DraftEval completed, visible to your org, not yet approved
PublishedApproved and visible on the leaderboard
ArchivedHidden from default views, admin only
You can publish your own draft reports from the dashboard.