Skip to content

Add agent trajectory evaluation against declarative rubrics#244

Merged
JE-Chen merged 3 commits into
devfrom
feat/trajectory-eval-batch
Jun 19, 2026
Merged

Add agent trajectory evaluation against declarative rubrics#244
JE-Chen merged 3 commits into
devfrom
feat/trajectory-eval-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 19, 2026

Copy link
Copy Markdown
Member

Agent-QA batch — deterministic scoring of an agent run. Full layers + tests + EN/Zh v36 docs + README.

Feature (utils/trajectory_eval, pure-stdlib)

  • evaluate_trajectory(trajectory, rubric): a trajectory is the ordered list of {action, args, observation} steps a run took; a rubric is plain data — required_actions (+ ordered for relative order), forbidden_actions, max_steps, success_contains. Returns {passed, score, steps, checks}; score is the fraction of applicable checks passed, passed requires all, and each check is {name, passed, detail} so a failure pinpoints the violated expectation. An empty rubric trivially passes.
  • The rubric is JSON-friendly, so it lives in JSON action files and travels over MCP. Executor AC_evaluate_trajectory (coerces JSON-string args from the visual builder); MCP ac_evaluate_trajectory; Builder under Agent.

Verification

  • 11 tests pass (empty rubric, required present/missing, ordered ok/bad, forbidden, max_steps, success_contains, partial score 0.5, executor round-trip with JSON strings, wiring, facade); ruff clean; radon no CC≥C; bandit clean; PySide6-free.
  • Rebased onto dev with the approval-testing (Add approval testing: verify artifacts against approved baselines #243) registrations; kept both.

@codacy-production

codacy-production Bot commented Jun 19, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 45 complexity · -39 duplication

Metric Results
Complexity 45
Duplication -39

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit dd602de into dev Jun 19, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/trajectory-eval-batch branch June 19, 2026 15:16
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant