Executive Summary
Telemetry is flowing for gh-aw (Sentry org github, project gh-aw): the spans dataset has fresh data through 2026-06-18T23:25Z. Over the last 24h I confirmed 16 distinct workflow runs with gh-aw.run.status:failure against a floor of ≥19 successful runs (the success-span query hit the 100-span cap, so true success count is higher). No timed_out or cancelled run-status values were present — failing runs are recorded as failure, not timeouts.
The single highest-signal recurring problem is Smoke Copilot (4 failed runs in 24h). The failure localizes to the agent phase, not export/infra: trace continuity is intact and the OTLP exporter is healthy. The errors and logs datasets are empty for the window, and three core attributes the standard review playbook expects (gen_ai.response.finish_reasons, native span.status, span-level github.run_id) are not searchable in Sentry's spans dataset despite being emitted — a confirmed instrumentation/mapping gap that blocks truncation analysis and native failure filtering.
Top Reliability Findings
| Priority |
Workflow |
Problem |
Evidence |
Next Action |
| 1 |
Smoke Copilot |
Recurring agent-phase failure (4 runs/24h) |
gh-aw.run.status:failure on runs 27784259295, 27738605733, 27737664591, 27726924811; rep. trace 0c0034be... agent.conclusion=630.2s status=failure |
Inspect copilot agent step logs on run 27784259295 |
| 2 |
Matt Pocock Skills Reviewer |
Repeated failure (2 runs/24h) |
failure on runs 27755558806, 27737630181 |
Compare against Smoke Copilot agent-phase cause |
| 3 |
8 other workflows |
One-off failures each |
failure runs incl. PR Sous Chef 27780036106, Smoke Claude 27737636941, Daily Regulatory Report Generator 27791592821, Test Quality Sentinel 27756657067 (+3 [Filtered] names) |
Triage individually; likely not systemic |
| 4 |
(fleet) |
Truncation / runaway-token outcome unverifiable |
has:gen_ai.response.finish_reasons → 0 spans/24h; max output_tokens=92,805 (model small, trace 8b49cd03...) — inconclusive |
Restore finish-reason searchability (see Rec. 3) |
| 5 |
(fleet) |
Native span.status not populated in Sentry |
span.status:ok and span.status:internal_error both → 0 results/24h |
Triage by gh-aw.run.status; see Rec. 1 & 4 |
Not a failure (separated): the slowest gh-aw.agent.conclusion spans (up to 3,011,599 ms ≈ 50 min, e.g. Daily AW Cross-Repo Compile Check 27751740756) are all status:success — expected long-running daily jobs, not timeouts.
Representative Traces
View representative traces
Confirmed failure — Smoke Copilot, run 27784259295, trace 0c0034be80c6343dff5c8b5e5734fd26
- Continuity intact across
gh-aw.pre_activation → activation → agent → push_experiments_state (all share the trace).
- Failure localizes to
gh-aw.agent.conclusion = 630,186 ms (~10.5 min), gh-aw.run.status:failure and gh-aw.agent.agent = 436,491 ms.
- Inside the agent phase: a
gateway.backend.execute / mcp.tool_call ran 76,156 ms (~76 s) — a candidate contributing factor.
- Exporter healthy; this is an agent-execution failure, not an export or auth failure.
- Trace: https://github.sentry.io/explore/traces/trace/0c0034be80c6343dff5c8b5e5734fd26
Latency outlier (healthy) — Daily AW Cross-Repo Compile Check, run 27751740756, trace 83630186ba94b071ed242dbdf7776ca6
gh-aw.agent.conclusion = 3,011,599 ms with gh-aw.run.status:success. Long but expected; not a reliability defect.
Token outlier (inconclusive) — trace 8b49cd03b942b6a0c5dce166a460f6a0
gen_ai.usage.output_tokens = 92,805, gen_ai.request.model:small. No finish_reasons present, so cannot confirm truncation vs. legitimate large output.
Recommendations
- Triage by
gh-aw.run.status, not span.status (no code change). Update the Sentry saved query/playbook to the emitted keys — gh-aw.workflow.name, gh-aw.run.status, gh-aw.run.id — since the playbook's gh_aw.workflow_name / span.status / github.run_id return false negatives in Sentry's spans dataset.
- Investigate Smoke Copilot's recurring agent-phase failure (4/24h). Start from run
27784259295 agent-step logs and the 76 s MCP tool call in trace 0c0034be....
- Restore finish-reason searchability.
send_otlp_span.cjs:2143-2146 emits gen_ai.response.finish_reasons as an array attribute, but Sentry's spans dataset returns 0 spans for has:gen_ai.response.finish_reasons over 24h. Emit an additional scalar gen_ai.response.finish_reason alongside the array so truncation/length-stop is queryable.
- Surface failures via native span status, or document the canonical field. Failures already set OTLP
status.code=2 (send_otlp_span.cjs:2024,2060), yet Sentry's span.status shows neither ok nor internal_error. Verify the OTLP→Sentry status mapping, or document gh-aw.run.status as the canonical outcome field for dashboards/alerts.
Notes
View notes
- MCP build limitation: this Sentry MCP exposes only
list_events/list_issue_events — no search_events or get_trace_details. Trace continuity was verified via list_events filtered by trace:<id>. list_events caps limit at 100 and renders only a fixed field set plus explicitly-requested attributes.
- Datasets:
errors and logs (and ourlogs) datasets returned 0 events/24h — no error-event or log correlation is available; failures are observable only as span attributes. State explicitly as an observability finding, not a clean bill of health.
- Emit vs. Sentry mapping (cross-checked in
actions/setup/js/send_otlp_span.cjs):
github.run_id, github.run_attempt, service.version are emitted as resource attributes (:360, :423-424) and are not exposed as searchable span fields in Sentry — their query "absence" is a backend mapping artifact, not an emit bug. Sentry's release (its mapping of service.version) is present.
- OTLP
status.code defaults to OK=1 (:329) and is set to ERROR=2 on agent-non-OK/failure (:2024, :2060); Sentry does not surface this as span.status.
- PII scrubbing: three failing runs render with
[Filtered] workflow names (Sentry data scrubbing) — reduces attribution; run IDs 27738606387, 27737665241, 27726925449.
- Inconclusive vs. confirmed: failures are confirmed via the
gh-aw.run.status attribute + verified trace continuity. Truncation/runaway-token outcomes are inconclusive (no finish_reasons in Sentry). No timeouts were claimed — no timed_out status exists in the data.
References:
- §27784259295 — Smoke Copilot (recurring failure, rep. trace)
- §27791592821 — Daily Regulatory Report Generator (failure)
- §27755558806 — Matt Pocock Skills Reviewer (repeated failure)
Generated by 🚨 Daily Reliability Review · 175.5 AIC · ⌖ 12.6 AIC · ⊞ 5.4K · ◷
Executive Summary
Telemetry is flowing for
gh-aw(Sentry orggithub, projectgh-aw): the spans dataset has fresh data through 2026-06-18T23:25Z. Over the last 24h I confirmed 16 distinct workflow runs withgh-aw.run.status:failureagainst a floor of ≥19 successful runs (the success-span query hit the 100-span cap, so true success count is higher). Notimed_outorcancelledrun-status values were present — failing runs are recorded asfailure, not timeouts.The single highest-signal recurring problem is Smoke Copilot (4 failed runs in 24h). The failure localizes to the agent phase, not export/infra: trace continuity is intact and the OTLP exporter is healthy. The errors and logs datasets are empty for the window, and three core attributes the standard review playbook expects (
gen_ai.response.finish_reasons, nativespan.status, span-levelgithub.run_id) are not searchable in Sentry's spans dataset despite being emitted — a confirmed instrumentation/mapping gap that blocks truncation analysis and native failure filtering.Top Reliability Findings
gh-aw.run.status:failureon runs27784259295,27738605733,27737664591,27726924811; rep. trace0c0034be...agent.conclusion=630.2s status=failure27755558806,2773763018127780036106, Smoke Claude27737636941, Daily Regulatory Report Generator27791592821, Test Quality Sentinel27756657067(+3[Filtered]names)has:gen_ai.response.finish_reasons→ 0 spans/24h; max output_tokens=92,805 (modelsmall, trace8b49cd03...) — inconclusivespan.statusnot populated in Sentryspan.status:okandspan.status:internal_errorboth → 0 results/24hgh-aw.run.status; see Rec. 1 & 4Not a failure (separated): the slowest
gh-aw.agent.conclusionspans (up to 3,011,599 ms ≈ 50 min, e.g. Daily AW Cross-Repo Compile Check27751740756) are allstatus:success— expected long-running daily jobs, not timeouts.Representative Traces
View representative traces
Confirmed failure — Smoke Copilot, run
27784259295, trace0c0034be80c6343dff5c8b5e5734fd26gh-aw.pre_activation→activation→agent→push_experiments_state(all share the trace).gh-aw.agent.conclusion= 630,186 ms (~10.5 min),gh-aw.run.status:failureandgh-aw.agent.agent= 436,491 ms.gateway.backend.execute/mcp.tool_callran 76,156 ms (~76 s) — a candidate contributing factor.Latency outlier (healthy) — Daily AW Cross-Repo Compile Check, run
27751740756, trace83630186ba94b071ed242dbdf7776ca6gh-aw.agent.conclusion= 3,011,599 ms withgh-aw.run.status:success. Long but expected; not a reliability defect.Token outlier (inconclusive) — trace
8b49cd03b942b6a0c5dce166a460f6a0gen_ai.usage.output_tokens= 92,805,gen_ai.request.model:small. Nofinish_reasonspresent, so cannot confirm truncation vs. legitimate large output.Recommendations
gh-aw.run.status, notspan.status(no code change). Update the Sentry saved query/playbook to the emitted keys —gh-aw.workflow.name,gh-aw.run.status,gh-aw.run.id— since the playbook'sgh_aw.workflow_name/span.status/github.run_idreturn false negatives in Sentry's spans dataset.27784259295agent-step logs and the 76 s MCP tool call in trace0c0034be....send_otlp_span.cjs:2143-2146emitsgen_ai.response.finish_reasonsas an array attribute, but Sentry's spans dataset returns 0 spans forhas:gen_ai.response.finish_reasonsover 24h. Emit an additional scalargen_ai.response.finish_reasonalongside the array so truncation/length-stop is queryable.status.code=2(send_otlp_span.cjs:2024,2060), yet Sentry'sspan.statusshows neitheroknorinternal_error. Verify the OTLP→Sentry status mapping, or documentgh-aw.run.statusas the canonical outcome field for dashboards/alerts.Notes
View notes
list_events/list_issue_events— nosearch_eventsorget_trace_details. Trace continuity was verified vialist_eventsfiltered bytrace:<id>.list_eventscapslimitat 100 and renders only a fixed field set plus explicitly-requested attributes.errorsandlogs(andourlogs) datasets returned 0 events/24h — no error-event or log correlation is available; failures are observable only as span attributes. State explicitly as an observability finding, not a clean bill of health.actions/setup/js/send_otlp_span.cjs):github.run_id,github.run_attempt,service.versionare emitted as resource attributes (:360,:423-424) and are not exposed as searchable span fields in Sentry — their query "absence" is a backend mapping artifact, not an emit bug. Sentry'srelease(its mapping ofservice.version) is present.status.codedefaults to OK=1 (:329) and is set to ERROR=2 on agent-non-OK/failure (:2024,:2060); Sentry does not surface this asspan.status.[Filtered]workflow names (Sentry data scrubbing) — reduces attribution; run IDs27738606387,27737665241,27726925449.gh-aw.run.statusattribute + verified trace continuity. Truncation/runaway-token outcomes are inconclusive (nofinish_reasonsin Sentry). No timeouts were claimed — notimed_outstatus exists in the data.References: