Add layered plugin/skill validation tooling across agent ecosystems by ScriptedAlchemy · Pull Request #219 · ScriptedAlchemy/tracedecay

ScriptedAlchemy · 2026-07-02T08:35:06Z

Summary

Adopts the strongest available validation for the plugin bundles we generate (cursor-plugin/, codex-plugin/) and the skills they ship, layered from cheapest to most end-to-end. Full architecture in the new docs/PLUGIN-VALIDATION.md.

Layer 1 — official schema validation (offline, cargo test): vendors Cursor's published JSON Schemas (plugin.schema.json, marketplace.schema.json from cursor/plugins@4a91a6e / 920a87f) plus doc-derived mcp.schema.json / hooks.schema.json (no official standalone schemas exist; derived from cursor.com/docs/context/mcp and cursor.com/docs/hooks, provenance recorded in each schema). Validated via a jsonschema (0.46.8, default-features = false) dev-dependency in tests/agent_suite/plugin_manifest_schema_test.rs and plugin_config_schema_test.rs, with negative cases guarding the derived schemas.
- Real bug caught immediately: both bundle manifests declared author.url, which the official schema rejects (author allows only name/email). Fixed; the URL still ships via homepage/repository.
Layer 2 — skill lint (Cursor): skill_lint_cursor_test.rs ports the useful closed-rule subset of skillmark, skilldoctor, and skillkit into Rust: file hygiene, heading conventions, description quality, and reference integrity (cross-skill tracedecay:<slug> refs, /slash refs, and every tracedecay_* tool mention resolved against the live get_tool_definitions() list).
Layer 3 — cross-bundle sync: plugin_bundle_sync_test.rs enforces disk-level parity across all bundles through declarative, self-cleaning policy tables (undeclared divergence fails, and so does a stale exception). Bundle-count agnostic: a future claude-plugin/ joins by adding one row.
Layer 4 — rendered-output validation: update_plugin_test.rs now installs into temp homes and validates the rendered bundles: full draft-07 schema validation of the rendered manifest, absolute shell-quoted hook commands, version stamps, source⊆rendered file completeness, and a placeholder sweep where the only ${...} survivor allowed is the intentional mcp.json ${workspaceFolder} arg (pin shared with fix: tolerate literal ${workspaceFolder} in serve --path #206's serve-side fallback).
Layer 5 — Claude Code portability: skill_lint_claude_test.rs validates all 65 skills against Claude Code / Agent Skills spec rules (code.claude.com/docs/en/skills, agentskills.io/specification, anthropics/skills quick_validate.py, and the .claude-plugin/ layouts Anthropic ships). Two documented conflict skips (disable-model-invocation, paths — Claude Code supports both; only the strict packaging spec rejects them) with a stale-allowlist guard. A future claude-plugin/ bundle is a re-packaging exercise.
Layer 6 — CI: .github/workflows/plugin-validation.yml mirrors the official cursor/plugins ajv workflow (pinned ajv-cli@5.0.0 + ajv-formats@2.1.1) and wires scripts/mcp-conformance-smoke.sh — a hermetic smoke driving tracedecay serve through the pinned MCP Inspector CLI (@modelcontextprotocol/inspector@0.22.0), adding protocol-version negotiation and SDK-side Zod validation the Rust MCP tests can't cover.

All five new test modules were folded into the consolidated agent_suite binary (matching the repo's link-time convention, cf. #211), with shared helpers (SkillDoc loader, schema compile/validate, repo_path, kebab-case rule, tree walk) deduped into tests/common/mod.rs.

Adopted vs rejected

Adopted: official Cursor schemas + ajv workflow; MCP Inspector CLI smoke; skillmark/skilldoctor/skillkit rules ported to Rust (offline, no node toolchain in cargo test); Claude/agentskills.io spec rules.
Rejected: @modelcontextprotocol/conformance (server mode is streamable-HTTP-only; tracedecay serve is stdio-only — revisit if an HTTP transport lands); running skill-tools/skillmark as npx CI steps (network/toolchain dependency for rules we can enforce natively); skillmark's script-security AST rules (bundles ship zero scripts today), NLP-ish description heuristics, and scoring-only rules (conflict with the deliberately lean skill style).

Manifest-path verdict

.cursor-plugin/plugin.json is the documented location and this repo already conformed (docs: cursor.com/docs/reference/plugins; every official plugin and the working local install use it — the earlier "root plugin.json" claim was an ls missing the dot-directory). No layout change; the layout was already pinned by three existing assertions.

Test plan

cargo test --test agent_suite — 382 passed, 0 failed (repeated runs; includes the 5 new modules + rendered-output tests)
cargo test --lib agents:: (135) / --lib hooks:: (23) / --test hooks_lsp_suite (104) — all pass
cargo check --all-targets, cargo clippy (0 warnings in repo code), cargo fmt --check — clean
actionlint + YAML parse on the workflow — clean; bash -n on the smoke script — clean
scripts/mcp-conformance-smoke.sh against a debug build — 7/7 checks pass
Fixed a pre-existing flake surfaced by the suite consolidation: test_cursor_healthcheck_warns_on_literal_workspace_folder_transcript_path (from fix: tolerate literal ${workspaceFolder} in serve --path #206) read the user-data-dir env without the suite's env lock; it now pins and serializes like the other TraceDecay::init tests (0 failures across 10+ full-suite runs after the fix).
mcp_suite full-parallel runs showed 10 environmental flakes on this shared dev box (dashboard port collisions with concurrently running agents); all pass in isolation and none touch this PR's surface.

Follow-ups (deliberately not in this PR)

Promote CODEX_SKILL_*_DIVERGENCES in src/agents/codex.rs to pub(crate) consts consumed by both the unit parity test and the sync test (single source of truth for the divergence allowlists).
memorize-subject vs memorizing-subject: near-duplicate explicit-invoke skill names; both are referenced so neither is stale, but the naming deserves a deliberate look.
If a third ecosystem bundle lands, consider generating bundles from cursor-plugin/ as canonical source (cargo xtask sync-bundles); the sync test's policy tables are the generator's spec, and today's 2-bundle/2-divergence reality doesn't justify it yet.
The plugin-validation workflow is path-filtered, so it reports as skipped on unrelated PRs — account for that before adding it to required checks.

Merge interplay

#206 and #210 are merged and this branch is up to date with master (the #206 rendered-args pin was reconciled into the shared rendered-bundle validator). #212 (host-integration-parity) is still open and touches tests/agent_suite/main.rs + agent tests — expect a trivial mod-list merge conflict in main.rs for whichever lands second; the schema tests will re-validate any bundle content it changes (including re-flagging author.url if it gets re-added).

changeset-bot · 2026-07-02T08:35:10Z

⚠️ No Changeset found

Latest commit: fbfd9ec

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17b565b78b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Vendor Cursor's official plugin/marketplace JSON schemas (cursor/plugins @4a91a6e) plus doc-derived mcp.json/hooks.json schemas, and validate both source bundles offline via a jsonschema dev-dependency. Drops the schema-invalid author.url key the validation immediately caught.

Prove rendered Cursor/Codex installs are structurally sound: absolute quoted hook commands, version-stamped manifests, no surviving template placeholders except the intentional mcp.json workspaceFolder pin, and no source-bundle file silently dropped.

Port the useful closed-rule subset of skillmark/skilldoctor/skillkit (hygiene, headings, reference integrity, description quality) and the Claude Code / Agent Skills spec portability rules so every bundled skill is proven valid for each ecosystem it targets, offline in cargo.

Declarative bundle-count-agnostic sync policy: every top-level bundle entry and every skill is byte-synced across cursor-plugin/ and codex-plugin/ or covered by a documented, self-cleaning exception.

ajv schema job mirroring cursor/plugins' official validate workflow (pinned deps), plus a hermetic MCP Inspector CLI smoke script driving tracedecay serve through the official TypeScript SDK client.

New docs/PLUGIN-VALIDATION.md mapping each validation layer to its tests/schemas, plus a CONTRIBUTING section on validating plugins.

…tion-tooling # Conflicts: # tests/agent_suite/update_plugin_test.rs

ScriptedAlchemy changed the title ~~[codex] Add plugin validation tooling~~ Add layered plugin/skill validation tooling across agent ecosystems Jul 2, 2026

ScriptedAlchemy marked this pull request as ready for review July 2, 2026 09:08

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread .github/workflows/plugin-validation.yml

ScriptedAlchemy added 14 commits July 2, 2026 10:27

test: enforce cross-bundle plugin sync manifest

0e64565

Declarative bundle-count-agnostic sync policy: every top-level bundle entry and every skill is byte-synced across cursor-plugin/ and codex-plugin/ or covered by a documented, self-cleaning exception.

ci: add plugin validation workflow and MCP smoke script

09903f9

ajv schema job mirroring cursor/plugins' official validate workflow (pinned deps), plus a hermetic MCP Inspector CLI smoke script driving tracedecay serve through the official TypeScript SDK client.

docs: document the plugin validation layers

9de7f39

New docs/PLUGIN-VALIDATION.md mapping each validation layer to its tests/schemas, plus a CONTRIBUTING section on validating plugins.

test: integrate plugin validation checks into agent suite

615196a

test: drop pass-through wrappers left by suite consolidation

f5ddb33

ci: run plugin smoke on mcp source changes

563f977

test(agent): stabilize plugin validation CI

fcfca7c

fix: canonicalize profile data dir roots

be1570a

test: align daemon paths with canonical profile roots

30016ec

test: canonicalize storage suite profile roots

5338605

test(storage): normalize profile shard path assertions

00961eb

ScriptedAlchemy force-pushed the codex/plugin-validation-tooling branch from d3055e3 to 00961eb Compare July 2, 2026 10:30

ScriptedAlchemy and others added 2 commits July 2, 2026 16:37

test(plugin): remove duplicated file listing helper

e1ecdda

Merge remote-tracking branch 'origin/master' into codex/plugin-valida…

fbfd9ec

…tion-tooling # Conflicts: # tests/agent_suite/update_plugin_test.rs

ScriptedAlchemy merged commit 2706da2 into master Jul 3, 2026
16 checks passed

ScriptedAlchemy deleted the codex/plugin-validation-tooling branch July 3, 2026 01:02

ScriptedAlchemy mentioned this pull request Jul 3, 2026

fix(sessions): advance Claude parse cursor for filtered transcripts + sanity-check follow-ups #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add layered plugin/skill validation tooling across agent ecosystems#219

Add layered plugin/skill validation tooling across agent ecosystems#219
ScriptedAlchemy merged 16 commits into
masterfrom
codex/plugin-validation-tooling

ScriptedAlchemy commented Jul 2, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ScriptedAlchemy commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Adopted vs rejected

Manifest-path verdict

Test plan

Follow-ups (deliberately not in this PR)

Merge interplay

Uh oh!

changeset-bot Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ScriptedAlchemy commented Jul 2, 2026 •

edited

Loading

changeset-bot Bot commented Jul 2, 2026 •

edited

Loading