Skip to content

[reliability] Daily Reliability Review - 2026-06-18 #40168

@github-actions

Description

@github-actions

Executive Summary

Telemetry is flowing for gh-aw (Sentry org github, project gh-aw): the spans dataset has fresh data through 2026-06-18T23:25Z. Over the last 24h I confirmed 16 distinct workflow runs with gh-aw.run.status:failure against a floor of ≥19 successful runs (the success-span query hit the 100-span cap, so true success count is higher). No timed_out or cancelled run-status values were present — failing runs are recorded as failure, not timeouts.

The single highest-signal recurring problem is Smoke Copilot (4 failed runs in 24h). The failure localizes to the agent phase, not export/infra: trace continuity is intact and the OTLP exporter is healthy. The errors and logs datasets are empty for the window, and three core attributes the standard review playbook expects (gen_ai.response.finish_reasons, native span.status, span-level github.run_id) are not searchable in Sentry's spans dataset despite being emitted — a confirmed instrumentation/mapping gap that blocks truncation analysis and native failure filtering.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
1 Smoke Copilot Recurring agent-phase failure (4 runs/24h) gh-aw.run.status:failure on runs 27784259295, 27738605733, 27737664591, 27726924811; rep. trace 0c0034be... agent.conclusion=630.2s status=failure Inspect copilot agent step logs on run 27784259295
2 Matt Pocock Skills Reviewer Repeated failure (2 runs/24h) failure on runs 27755558806, 27737630181 Compare against Smoke Copilot agent-phase cause
3 8 other workflows One-off failures each failure runs incl. PR Sous Chef 27780036106, Smoke Claude 27737636941, Daily Regulatory Report Generator 27791592821, Test Quality Sentinel 27756657067 (+3 [Filtered] names) Triage individually; likely not systemic
4 (fleet) Truncation / runaway-token outcome unverifiable has:gen_ai.response.finish_reasons → 0 spans/24h; max output_tokens=92,805 (model small, trace 8b49cd03...) — inconclusive Restore finish-reason searchability (see Rec. 3)
5 (fleet) Native span.status not populated in Sentry span.status:ok and span.status:internal_error both → 0 results/24h Triage by gh-aw.run.status; see Rec. 1 & 4

Not a failure (separated): the slowest gh-aw.agent.conclusion spans (up to 3,011,599 ms ≈ 50 min, e.g. Daily AW Cross-Repo Compile Check 27751740756) are all status:success — expected long-running daily jobs, not timeouts.

Representative Traces

View representative traces

Confirmed failure — Smoke Copilot, run 27784259295, trace 0c0034be80c6343dff5c8b5e5734fd26

  • Continuity intact across gh-aw.pre_activationactivationagentpush_experiments_state (all share the trace).
  • Failure localizes to gh-aw.agent.conclusion = 630,186 ms (~10.5 min), gh-aw.run.status:failure and gh-aw.agent.agent = 436,491 ms.
  • Inside the agent phase: a gateway.backend.execute / mcp.tool_call ran 76,156 ms (~76 s) — a candidate contributing factor.
  • Exporter healthy; this is an agent-execution failure, not an export or auth failure.
  • Trace: https://github.sentry.io/explore/traces/trace/0c0034be80c6343dff5c8b5e5734fd26

Latency outlier (healthy) — Daily AW Cross-Repo Compile Check, run 27751740756, trace 83630186ba94b071ed242dbdf7776ca6

  • gh-aw.agent.conclusion = 3,011,599 ms with gh-aw.run.status:success. Long but expected; not a reliability defect.

Token outlier (inconclusive) — trace 8b49cd03b942b6a0c5dce166a460f6a0

  • gen_ai.usage.output_tokens = 92,805, gen_ai.request.model:small. No finish_reasons present, so cannot confirm truncation vs. legitimate large output.

Recommendations

  1. Triage by gh-aw.run.status, not span.status (no code change). Update the Sentry saved query/playbook to the emitted keys — gh-aw.workflow.name, gh-aw.run.status, gh-aw.run.id — since the playbook's gh_aw.workflow_name / span.status / github.run_id return false negatives in Sentry's spans dataset.
  2. Investigate Smoke Copilot's recurring agent-phase failure (4/24h). Start from run 27784259295 agent-step logs and the 76 s MCP tool call in trace 0c0034be....
  3. Restore finish-reason searchability. send_otlp_span.cjs:2143-2146 emits gen_ai.response.finish_reasons as an array attribute, but Sentry's spans dataset returns 0 spans for has:gen_ai.response.finish_reasons over 24h. Emit an additional scalar gen_ai.response.finish_reason alongside the array so truncation/length-stop is queryable.
  4. Surface failures via native span status, or document the canonical field. Failures already set OTLP status.code=2 (send_otlp_span.cjs:2024,2060), yet Sentry's span.status shows neither ok nor internal_error. Verify the OTLP→Sentry status mapping, or document gh-aw.run.status as the canonical outcome field for dashboards/alerts.

Notes

View notes
  • MCP build limitation: this Sentry MCP exposes only list_events/list_issue_events — no search_events or get_trace_details. Trace continuity was verified via list_events filtered by trace:<id>. list_events caps limit at 100 and renders only a fixed field set plus explicitly-requested attributes.
  • Datasets: errors and logs (and ourlogs) datasets returned 0 events/24h — no error-event or log correlation is available; failures are observable only as span attributes. State explicitly as an observability finding, not a clean bill of health.
  • Emit vs. Sentry mapping (cross-checked in actions/setup/js/send_otlp_span.cjs):
    • github.run_id, github.run_attempt, service.version are emitted as resource attributes (:360, :423-424) and are not exposed as searchable span fields in Sentry — their query "absence" is a backend mapping artifact, not an emit bug. Sentry's release (its mapping of service.version) is present.
    • OTLP status.code defaults to OK=1 (:329) and is set to ERROR=2 on agent-non-OK/failure (:2024, :2060); Sentry does not surface this as span.status.
  • PII scrubbing: three failing runs render with [Filtered] workflow names (Sentry data scrubbing) — reduces attribution; run IDs 27738606387, 27737665241, 27726925449.
  • Inconclusive vs. confirmed: failures are confirmed via the gh-aw.run.status attribute + verified trace continuity. Truncation/runaway-token outcomes are inconclusive (no finish_reasons in Sentry). No timeouts were claimed — no timed_out status exists in the data.

References:

Generated by 🚨 Daily Reliability Review · 175.5 AIC · ⌖ 12.6 AIC · ⊞ 5.4K ·

  • expires on Jun 20, 2026, 3:33 PM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions