Summary
When using Headroom as an Anthropic API proxy with a Bedrock backend in streaming mode, the message_start SSE event always contains "usage": {"input_tokens": 0, "output_tokens": 0}. This causes downstream clients (e.g. Claude Code) that read input_tokens from message_start to report 0 for all input token metrics.
Environment
- Headroom version: 0.26.0
- Backend:
litellm-bedrock (Bedrock via LiteLLM)
- Mode: streaming (
stream: true)
- Client: Claude Code 2.1.181
Observed Behavior
Claude Code emits OTel metrics based on the usage fields in the Anthropic SSE stream:
message_start → reads usage.input_tokens, usage.cache_read_input_tokens, usage.cache_creation_input_tokens
message_delta → reads usage.output_tokens
When connecting through Headroom to Bedrock:
input_tokens = always 0
cache_read_input_tokens = always 0
cache_creation_input_tokens = always 0
output_tokens = correct (non-zero)
When connecting directly to Bedrock (without Headroom):
- All fields are correctly populated by Claude Code (e.g.,
input_tokens=1, cache_read_input_tokens=61711, etc.)
Evidence
Raw CloudWatch Logs comparison (same OTel pipeline, same day):
Through Headroom (my account):
04:14:30 type=cacheRead tokens=0
04:14:30 type=cacheCreation tokens=0
04:14:30 type=input tokens=0
04:14:30 type=output tokens=57
Direct to Bedrock (colleague, no Headroom):
10:00:46 type=cacheRead tokens=61711
10:00:46 type=cacheCreation tokens=407
10:00:46 type=input tokens=1
10:00:46 type=output tokens=291
Impact
- Token usage dashboards (Athena/CloudWatch) show near-zero usage for Headroom users
- Cost attribution queries produce incorrect results
- The "Top Users by Token Usage" report underreports Headroom users by ~99%
- Only
output_tokens is tracked correctly
Reproduction
- Configure Headroom with
--backend bedrock
- Send a streaming request through the proxy
- Inspect the
message_start SSE event received by the client
- Observe
usage.input_tokens is always 0
The non-streaming path does not have this issue — the full usage block from Bedrock's response is passed through correctly in JSONResponse(content=backend_response.body).
Summary
When using Headroom as an Anthropic API proxy with a Bedrock backend in streaming mode, the
message_startSSE event always contains"usage": {"input_tokens": 0, "output_tokens": 0}. This causes downstream clients (e.g. Claude Code) that readinput_tokensfrommessage_startto report 0 for all input token metrics.Environment
litellm-bedrock(Bedrock via LiteLLM)stream: true)Observed Behavior
Claude Code emits OTel metrics based on the
usagefields in the Anthropic SSE stream:message_start→ readsusage.input_tokens,usage.cache_read_input_tokens,usage.cache_creation_input_tokensmessage_delta→ readsusage.output_tokensWhen connecting through Headroom to Bedrock:
input_tokens= always 0cache_read_input_tokens= always 0cache_creation_input_tokens= always 0output_tokens= correct (non-zero)When connecting directly to Bedrock (without Headroom):
input_tokens=1,cache_read_input_tokens=61711, etc.)Evidence
Raw CloudWatch Logs comparison (same OTel pipeline, same day):
Through Headroom (my account):
Direct to Bedrock (colleague, no Headroom):
Impact
output_tokensis tracked correctlyReproduction
--backend bedrockmessage_startSSE event received by the clientusage.input_tokensis always 0The non-streaming path does not have this issue — the full
usageblock from Bedrock's response is passed through correctly inJSONResponse(content=backend_response.body).