Add layered plugin/skill validation tooling across agent ecosystems#219
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17b565b78b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Vendor Cursor's official plugin/marketplace JSON schemas (cursor/plugins @4a91a6e) plus doc-derived mcp.json/hooks.json schemas, and validate both source bundles offline via a jsonschema dev-dependency. Drops the schema-invalid author.url key the validation immediately caught.
Prove rendered Cursor/Codex installs are structurally sound: absolute quoted hook commands, version-stamped manifests, no surviving template placeholders except the intentional mcp.json workspaceFolder pin, and no source-bundle file silently dropped.
Port the useful closed-rule subset of skillmark/skilldoctor/skillkit (hygiene, headings, reference integrity, description quality) and the Claude Code / Agent Skills spec portability rules so every bundled skill is proven valid for each ecosystem it targets, offline in cargo.
Declarative bundle-count-agnostic sync policy: every top-level bundle entry and every skill is byte-synced across cursor-plugin/ and codex-plugin/ or covered by a documented, self-cleaning exception.
ajv schema job mirroring cursor/plugins' official validate workflow (pinned deps), plus a hermetic MCP Inspector CLI smoke script driving tracedecay serve through the official TypeScript SDK client.
New docs/PLUGIN-VALIDATION.md mapping each validation layer to its tests/schemas, plus a CONTRIBUTING section on validating plugins.
d3055e3 to
00961eb
Compare
…tion-tooling # Conflicts: # tests/agent_suite/update_plugin_test.rs
Summary
Adopts the strongest available validation for the plugin bundles we generate (
cursor-plugin/,codex-plugin/) and the skills they ship, layered from cheapest to most end-to-end. Full architecture in the newdocs/PLUGIN-VALIDATION.md.cargo test): vendors Cursor's published JSON Schemas (plugin.schema.json,marketplace.schema.jsonfrom cursor/plugins@4a91a6e / 920a87f) plus doc-derivedmcp.schema.json/hooks.schema.json(no official standalone schemas exist; derived from cursor.com/docs/context/mcp and cursor.com/docs/hooks, provenance recorded in each schema). Validated via ajsonschema(0.46.8,default-features = false) dev-dependency intests/agent_suite/plugin_manifest_schema_test.rsandplugin_config_schema_test.rs, with negative cases guarding the derived schemas.author.url, which the official schema rejects (authorallows onlyname/email). Fixed; the URL still ships viahomepage/repository.skill_lint_cursor_test.rsports the useful closed-rule subset of skillmark, skilldoctor, and skillkit into Rust: file hygiene, heading conventions, description quality, and reference integrity (cross-skilltracedecay:<slug>refs,/slashrefs, and everytracedecay_*tool mention resolved against the liveget_tool_definitions()list).plugin_bundle_sync_test.rsenforces disk-level parity across all bundles through declarative, self-cleaning policy tables (undeclared divergence fails, and so does a stale exception). Bundle-count agnostic: a futureclaude-plugin/joins by adding one row.update_plugin_test.rsnow installs into temp homes and validates the rendered bundles: full draft-07 schema validation of the rendered manifest, absolute shell-quoted hook commands, version stamps, source⊆rendered file completeness, and a placeholder sweep where the only${...}survivor allowed is the intentionalmcp.json${workspaceFolder}arg (pin shared with fix: tolerate literal ${workspaceFolder} in serve --path #206's serve-side fallback).skill_lint_claude_test.rsvalidates all 65 skills against Claude Code / Agent Skills spec rules (code.claude.com/docs/en/skills, agentskills.io/specification, anthropics/skills quick_validate.py, and the.claude-plugin/layouts Anthropic ships). Two documented conflict skips (disable-model-invocation,paths— Claude Code supports both; only the strict packaging spec rejects them) with a stale-allowlist guard. A futureclaude-plugin/bundle is a re-packaging exercise..github/workflows/plugin-validation.ymlmirrors the official cursor/plugins ajv workflow (pinnedajv-cli@5.0.0+ajv-formats@2.1.1) and wiresscripts/mcp-conformance-smoke.sh— a hermetic smoke drivingtracedecay servethrough the pinned MCP Inspector CLI (@modelcontextprotocol/inspector@0.22.0), adding protocol-version negotiation and SDK-side Zod validation the Rust MCP tests can't cover.All five new test modules were folded into the consolidated
agent_suitebinary (matching the repo's link-time convention, cf. #211), with shared helpers (SkillDocloader, schema compile/validate,repo_path, kebab-case rule, tree walk) deduped intotests/common/mod.rs.Adopted vs rejected
cargo test); Claude/agentskills.io spec rules.@modelcontextprotocol/conformance(server mode is streamable-HTTP-only;tracedecay serveis stdio-only — revisit if an HTTP transport lands); running skill-tools/skillmark as npx CI steps (network/toolchain dependency for rules we can enforce natively); skillmark's script-security AST rules (bundles ship zero scripts today), NLP-ish description heuristics, and scoring-only rules (conflict with the deliberately lean skill style).Manifest-path verdict
.cursor-plugin/plugin.jsonis the documented location and this repo already conformed (docs: cursor.com/docs/reference/plugins; every official plugin and the working local install use it — the earlier "root plugin.json" claim was anlsmissing the dot-directory). No layout change; the layout was already pinned by three existing assertions.Test plan
cargo test --test agent_suite— 382 passed, 0 failed (repeated runs; includes the 5 new modules + rendered-output tests)cargo test --lib agents::(135) /--lib hooks::(23) /--test hooks_lsp_suite(104) — all passcargo check --all-targets,cargo clippy(0 warnings in repo code),cargo fmt --check— cleanactionlint+ YAML parse on the workflow — clean;bash -non the smoke script — cleanscripts/mcp-conformance-smoke.shagainst a debug build — 7/7 checks passtest_cursor_healthcheck_warns_on_literal_workspace_folder_transcript_path(from fix: tolerate literal ${workspaceFolder} in serve --path #206) read the user-data-dir env without the suite's env lock; it now pins and serializes like the otherTraceDecay::inittests (0 failures across 10+ full-suite runs after the fix).mcp_suitefull-parallel runs showed 10 environmental flakes on this shared dev box (dashboard port collisions with concurrently running agents); all pass in isolation and none touch this PR's surface.Follow-ups (deliberately not in this PR)
CODEX_SKILL_*_DIVERGENCESinsrc/agents/codex.rstopub(crate)consts consumed by both the unit parity test and the sync test (single source of truth for the divergence allowlists).memorize-subjectvsmemorizing-subject: near-duplicate explicit-invoke skill names; both are referenced so neither is stale, but the naming deserves a deliberate look.cursor-plugin/as canonical source (cargo xtask sync-bundles); the sync test's policy tables are the generator's spec, and today's 2-bundle/2-divergence reality doesn't justify it yet.Merge interplay
#206 and #210 are merged and this branch is up to date with master (the #206 rendered-args pin was reconciled into the shared rendered-bundle validator). #212 (
host-integration-parity) is still open and touchestests/agent_suite/main.rs+ agent tests — expect a trivialmod-list merge conflict inmain.rsfor whichever lands second; the schema tests will re-validate any bundle content it changes (including re-flaggingauthor.urlif it gets re-added).