Skip to content

Bedrock streaming: message_start emits input_tokens=0, breaking downstream OTel metrics #1132

@riyas-rawther

Description

@riyas-rawther

Summary

When using Headroom as an Anthropic API proxy with a Bedrock backend in streaming mode, the message_start SSE event always contains "usage": {"input_tokens": 0, "output_tokens": 0}. This causes downstream clients (e.g. Claude Code) that read input_tokens from message_start to report 0 for all input token metrics.

Environment

  • Headroom version: 0.26.0
  • Backend: litellm-bedrock (Bedrock via LiteLLM)
  • Mode: streaming (stream: true)
  • Client: Claude Code 2.1.181

Observed Behavior

Claude Code emits OTel metrics based on the usage fields in the Anthropic SSE stream:

  • message_start → reads usage.input_tokens, usage.cache_read_input_tokens, usage.cache_creation_input_tokens
  • message_delta → reads usage.output_tokens

When connecting through Headroom to Bedrock:

  • input_tokens = always 0
  • cache_read_input_tokens = always 0
  • cache_creation_input_tokens = always 0
  • output_tokens = correct (non-zero)

When connecting directly to Bedrock (without Headroom):

  • All fields are correctly populated by Claude Code (e.g., input_tokens=1, cache_read_input_tokens=61711, etc.)

Evidence

Raw CloudWatch Logs comparison (same OTel pipeline, same day):

Through Headroom (my account):

04:14:30  type=cacheRead      tokens=0
04:14:30  type=cacheCreation  tokens=0
04:14:30  type=input          tokens=0
04:14:30  type=output         tokens=57

Direct to Bedrock (colleague, no Headroom):

10:00:46  type=cacheRead      tokens=61711
10:00:46  type=cacheCreation  tokens=407
10:00:46  type=input          tokens=1
10:00:46  type=output         tokens=291

Impact

  • Token usage dashboards (Athena/CloudWatch) show near-zero usage for Headroom users
  • Cost attribution queries produce incorrect results
  • The "Top Users by Token Usage" report underreports Headroom users by ~99%
  • Only output_tokens is tracked correctly

Reproduction

  1. Configure Headroom with --backend bedrock
  2. Send a streaming request through the proxy
  3. Inspect the message_start SSE event received by the client
  4. Observe usage.input_tokens is always 0

The non-streaming path does not have this issue — the full usage block from Bedrock's response is passed through correctly in JSONResponse(content=backend_response.body).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions