Version: 1.0.0
Status: Draft
Latest Version: https://github.com/github/gh-aw-threat-detection/blob/main/specs/threat-detection-spec.md
This specification defines the requirements for the threat detection component of GitHub Agentic Workflows. The threat detection layer analyzes AI agent output for security threats before safe output jobs execute.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
This specification covers:
- Threat detection analysis categories
- Input/output contract for the detection CLI
- AI engine integration requirements
- Configuration interface
- Version compatibility
TD-01: A conforming implementation MUST provide automated threat detection.
TD-02: Threat detection MUST be automatically enabled when safe-outputs is configured.
TD-03: The implementation MUST support disabling threat detection via threat-detection: false.
TD-04: The implementation MUST detect the following threat categories:
- Prompt Injection: Malicious instructions manipulating AI behavior
- Secret Leaks: Exposed API keys, tokens, passwords, credentials
- Malicious Patches: Code changes introducing vulnerabilities or backdoors
TD-05: The implementation MAY support additional threat categories as extensions.
TD-06: The implementation MUST support AI-powered threat detection using configured AI engines.
TD-06a: The implementation MUST run threat detection as a single agentic
engine pass using the configured CLI engine. The engine MUST be given the
artifact content and the threat_detection_result reporting tool, and MUST
report its verdict in-session by invoking that tool, which writes a schema-valid
result to an out-of-band result sink. The implementation MUST read the verdict
exclusively from that sink; it MUST NOT parse the engine transcript for the
result. When no sink result is produced, the attempt MAY be retried with a
bounded self-correction prompt; retry exhaustion MUST be treated as an
infrastructure error.
TD-07: The implementation SHOULD support custom detection steps for specialized scanning:
threat-detection:
enabled: true
steps:
- name: Run TruffleHog
uses: trufflesecurity/trufflehog@mainTD-08: Threat detection MUST produce structured JSON output:
{
"prompt_injection": false,
"secret_leak": false,
"malicious_patch": false,
"reasons": []
}TD-09: If any threat is detected (true), the workflow MUST fail and safe outputs MUST NOT execute.
TD-10: The reasons array SHOULD contain human-readable explanations for detected threats.
TD-10a: The result reported through the threat_detection_result tool MUST
use the same JSON object shape as TD-08 with required boolean prompt_injection,
secret_leak, and malicious_patch fields and a required string-array reasons
field. The implementation MUST reject results that add unexpected fields, omit a
required field, or use the wrong type for any field.
TD-11: The implementation MUST support custom detection prompts:
threat-detection:
prompt: "Focus on SQL injection vulnerabilities"TD-12: Custom prompts MUST be appended to default detection instructions, not replace them.
TD-13: The implementation MUST support overriding the AI engine for threat detection:
threat-detection:
engine: "copilot"TD-14: The implementation MUST support full engine configuration objects:
threat-detection:
engine:
id: copilot
model: gpt-4
max-turns: 5TD-15: The implementation MUST support disabling AI-powered detection:
threat-detection:
engine: false
steps:
- name: Static Analysis
run: ./scan.shTD-16: The detector MUST accept an artifacts directory as its primary input argument.
TD-17: The artifacts directory MUST support the following structure:
<artifacts-dir>/
├── aw-prompts/
│ └── prompt.txt # Workflow prompt file
├── agent_output.json # Agent structured output
├── aw-*.patch # Git format-patch files (optional)
├── aw-*.bundle # Git bundle files (optional)
└── comment-memory/ # Agent comment memory (optional)
└── *.md
TD-18: The detector MUST NOT require all artifact files to be present. Missing optional files MUST be handled gracefully.
TD-19: The detector MUST output the structured JSON result (per TD-08) to stdout.
TD-20: The detector MUST support writing the result to a file via the --output flag.
TD-20b: The detector MUST provide a conclude subcommand that reads a structured
result file written by a prior detection run and emits the host-side job-output
contract (conclusion, reason, success) consumed by the parent orchestrator.
The verdict crosses the AWF sandbox boundary as a file (written to a read-write
mount), not via log scraping. When the result file is missing the subcommand MUST
report a clear agent_failure (e.g. "Detection result file not found at: ");
a malformed file MUST report parse_error; detected threats MUST report
threat_detected. In warn mode (GH_AW_DETECTION_CONTINUE_ON_ERROR != "false")
non-mandatory failures MUST surface as warnings without failing the job, except
that agent_failure and parse_error MUST hard-fail when the detection execution
step itself failed.
TD-21: The detector MUST use the following exit codes:
| Code | Meaning |
|---|---|
| 0 | Safe — no threats detected |
| 1 | Threat detected |
| 2 | Infrastructure/configuration error |
TD-21a: The exit code is an out-of-band signal for direct callers. In the
integrated detection job the verdict is conveyed to the host via the structured
detection_result.json file (TD-20b) and concluded by the conclude subcommand,
not by the detector exit code. The integration wrapper that maps the detector
exit code to the detection step's success/failure outcome MUST NOT be stricter
than gh-aw's native engine step: a recorded verdict (exit 0 or 1) and an
"engine ran but recorded no verdict" outcome (exit 2 with status reason
invalid_report_exhausted) MUST NOT mark the detection step as failed. Only a
genuine engine or configuration failure (e.g. status reason engine_error,
config_error, cancelled) may surface as a step failure. This prevents the
common flaky-output case from blocking safe outputs in warn mode, where gh-aw
treats a missing verdict as a recoverable parse_error and proceeds.
TD-22: The detector MUST support the following environment variables:
| Variable | Purpose |
|---|---|
WORKFLOW_NAME |
Name of the workflow being analyzed |
WORKFLOW_DESCRIPTION |
Description of the workflow |
CUSTOM_PROMPT |
Additional detection instructions |
TD-23: AI engine authentication variables MUST be treated as runtime-only configuration. They MUST NOT be required for parser, prompt building, unit test, or binary smoke test execution.
The implementation MAY pass through engine-specific authentication variables required by the selected CLI, including:
| Variable | Engine |
|---|---|
COPILOT_GITHUB_TOKEN |
Copilot |
ANTHROPIC_API_KEY |
Claude |
OPENAI_API_KEY |
Codex |
TD-24: The release-asset binary MUST be published under semantic version tags.
TD-25: The parent orchestrator (gh-aw) MUST pin to a specific detector version.
TD-26: Breaking changes to the input/output contract MUST increment the major version.
TD-27: Private repository status MUST NOT block detector publication or consumption. When the source repository is private, approved consuming repositories MUST be able to download pinned release assets with contents: read.
TD-36: The detection run SHOULD have no network access (fully blocked egress).
TD-37: The detection run MUST NOT have access to repository secrets beyond what is required for AI engine authentication.
TD-38: Detection results MUST NOT be modifiable by the agent being analyzed.