Skip to content

Buffer per-request StreamableHTTP streams to avoid serial-router head-of-line block#2934

Merged
maxisbey merged 6 commits into
mainfrom
fix/streamable-http-hol-buffer
Jun 22, 2026
Merged

Buffer per-request StreamableHTTP streams to avoid serial-router head-of-line block#2934
maxisbey merged 6 commits into
mainfrom
fix/streamable-http-hol-buffer

Conversation

@maxisbey

@maxisbey maxisbey commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Two related fixes to the stateful StreamableHTTPServerTransport so concurrent requests can't wedge each other and event-store ordering doesn't depend on scheduler timing.

Buffer the per-request _request_streams (head-of-line block). Each POST registers a buffer-0 _request_streams[id] channel, start_soons the EventSourceResponse, and pushes the request to the server. The single serial message_router then forwards each response with await _request_streams[id][0].send(...) — which on a buffer-0 stream parks until that request's sse_writer (started lazily, two start_soon hops deep) has reached its first receive(). While the router is parked on one request, every other in-flight response on the session is head-of-line blocked behind it. This gives the three _request_streams[EventMessage] sites a small bounded buffer (REQUEST_STREAM_BUFFER_SIZE = 16) so the router can deposit and move on. The downstream sse_stream dict streams stay at 0 — they're not on the router's send target.

Store the priming event before request dispatch (event-store ordering). With an event_store configured, message_router calls store_event(msg) before its send(), while the priming event was stored inside the lazily-started sse_writer — so its event-store position raced the first routed message. Splitting the helper into _mint_priming_event (store + return wire dict) and _run_sse_writer (forward onto the wire) lets the POST handler await _mint_priming_event(...) before writer.send(session_message). The server can't emit anything for a request it hasn't received, so the priming row now precedes every router store for that stream by data dependency rather than scheduler timing — the same shape the TypeScript SDK uses.

Motivation and Context

Fixes #1764. The natural race is hard to reproduce standalone (multiple parties report 0 hits across thousands of loopback iterations) but is observed consistently in production by several reporters on the issue. The unit-level head-of-line block is straightforward to demonstrate though: register two request streams, give B a consumer, leave A consumer-less, write responses A then B — on main B never arrives until A is closed; with this change B arrives immediately and A's response is buffered.

The priming-order fix addresses a review finding that the buffer change widened a pre-existing window where a routed message could be stored before the priming event.

How Has This Been Tested?

  • New tests/server/test_streamable_http_router.py drives the routing layer directly: one test for the head-of-line block, one asserting [priming, m₁, …, m₅] event-store order with no sse_writer ever scheduled.
  • tests/shared/test_streamable_http.py priming-helper unit tests updated for _mint_priming_event.
  • Existing reconnection / sse-polling / replay tests pass unchanged.

Breaking Changes

None. EventStore ABC is unchanged.

Intentional behaviour changes:

  • The standalone GET stream now buffers up to 16 server-initiated messages before backpressure engages on a slow GET client (was 0).
  • A request whose consumer stalls past 16 messages (e.g. a long-running tool emitting many progress notifications to a stalled SSE client) can still wedge the router — this moves the threshold from 0 to 16, it doesn't remove the serial router. The routerless design in _streamable_http_modern.py is the structural fix on main.
  • _maybe_send_priming_event is replaced by _mint_priming_event (returns the wire dict instead of sending) and _run_sse_writer. Both are private; no public surface change.
  • The replay path's priming/live-tail ordering window is pre-existing and unchanged here; tracked separately.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The buffer-size approach matches what reporters on #1764 have field-validated (sed-patching to 10). 16 covers the reported single-response list-request case by construction (one message per id) plus headroom for small notification workloads, while staying bounded for the session-lifetime GET stream.

A separate change is needed for the stateless task-leak half of #1764 (#2145's request-scoped task-group approach); that's not bundled here.

AI Disclaimer

@maxisbey maxisbey marked this pull request as ready for review June 20, 2026 20:56
Comment thread src/mcp/server/streamable_http.py Outdated
maxisbey added 3 commits June 22, 2026 10:48
…-of-line block

The serial message_router forwards each response with a blocking send into
a per-request buffer-0 stream whose only consumer (sse_writer) is started
lazily via nested start_soon. Under concurrent requests one not-yet-receiving
consumer parks the router and head-of-line blocks every other in-flight
response on the session.

Give the three _request_streams[EventMessage] sites a small bounded buffer
so the router can deposit and move on. The sse_stream dict streams stay at
0 (downstream of the router; buffering them would relax per-client
backpressure without helping the race).

Fixes #1764.
…order

Splits the old _maybe_send_priming_event into _mint_priming_event (store +
return wire dict) and _run_sse_writer (forward request_stream onto the wire).
The POST handler now awaits _mint_priming_event before writer.send(), so the
priming row is in the event store before the server can produce any message
for that request id — ordering by data dependency, not scheduler timing.

The replay path keeps its priming event (test_streamable_http_multiple_reconnections
relies on it as a stream-re-registered signal); its replay→live-tail ordering
window is pre-existing and orthogonal.

Also extracts the inline sse_writer closure to a method (drops _handle_post_request
below the C901 threshold) and widens the SSE-dict stream type to SSEEvent
(dict[str, Any]) — the previous dict[str, str] was a lie masked by the old
helper's Any parameter, since priming events carry retry: int.
Hoists _mint_priming_event to the top of the SSE arm so a user EventStore
raising on the priming row returns a 500 with no per-request state allocated
(previously _request_streams[id] and _sse_stream_writers[id] leaked for the
session). The shared _request_streams registration is pushed into each branch.

Adds an old-pv-reconnect test in test_hosting_resume.py covering the
priming_event-is-None replay arm; drops the no-branch pragma. The new
priming-failure test covers the outer except handler, so its pragmas and the
dead 'if writer:' check are removed.
@maxisbey maxisbey force-pushed the fix/streamable-http-hol-buffer branch from 0f0234a to 127b209 Compare June 22, 2026 10:49
Comment thread tests/shared/test_streamable_http.py Outdated
finally:
await write_stream.aclose()
await read_stream.aclose()
"""`_mint_priming_event` skips for old protocol versions (backwards compat)."""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do a more high level end to end test here? if it's too hard to drive the client via only high level interfaces, then whatever you want for client is fine. but really I think we should have a high level test showing it works in memory with the transport and a real server.

…nd-to-end

Adds a finally to replay_sender (mirroring _run_sse_writer) so resumed
connections clean up _sse_stream_writers[stream_id] and _request_streams[stream_id]
on disconnect. Nested inside the stream_id-is-set block so no edge-case
None-handling is needed.

Drops the four unit tests that poked _mint_priming_event directly and adds an
end-to-end test in test_hosting_resume.py asserting the event store records
[(S, priming), (S, msg1), ..., (S, response)] for a real POST through a real
MCPServer + transport. The dropped tests' branches remain covered by the
existing high-level tests in the same file.
Comment on lines +603 to 614
priming_event = await self._mint_priming_event(request_id, protocol_version)

# Store writer reference so close_sse_stream() can close it
sse_stream_writer, sse_stream_reader = anyio.create_memory_object_stream[SSEEvent](0)
self._sse_stream_writers[request_id] = sse_stream_writer
self._request_streams[request_id] = anyio.create_memory_object_stream[EventMessage](
REQUEST_STREAM_BUFFER_SIZE
)
request_stream_reader = self._request_streams[request_id][1]

async def sse_writer():
# Get the request ID from the incoming request message
try:
async with sse_stream_writer, request_stream_reader:
# Send priming event for SSE resumability
await self._maybe_send_priming_event(request_id, sse_stream_writer, protocol_version)

# Process messages from the request-specific stream
async for event_message in request_stream_reader:
# Build the event data
event_data = self._create_event_data(event_message)
await sse_stream_writer.send(event_data)

# If response, remove from pending streams and close
if isinstance(event_message.message, JSONRPCResponse | JSONRPCError):
break
except anyio.ClosedResourceError: # pragma: lax no cover
# Expected when close_sse_stream() is called
logger.debug("SSE stream closed by close_sse_stream()")
except Exception: # pragma: lax no cover
logger.exception("Error in SSE writer")
finally:
logger.debug("Closing SSE writer")
self._sse_stream_writers.pop(request_id, None)
await self._clean_up_memory_streams(request_id)

# Create and start EventSourceResponse
# SSE stream mode (original behavior)
# Set up headers
headers = {
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The SSE branch of _handle_post_request registers self._sse_stream_writers[request_id] before starting the EventSourceResponse, but the except Exception: "SSE response error" block only closes the writer and calls _clean_up_memory_streams — it never pops _sse_stream_writers[request_id], so a startup failure that happens before _run_sse_writer is ever scheduled leaves a closed-writer entry in the dict for the transport's lifetime. This is the same stale-entry class the PR's new finally in replay_sender fixes; a one-line self._sse_stream_writers.pop(request_id, None) in that except block closes the gap (the gap itself is pre-existing in shape, so non-blocking).

Extended reasoning...

What the bug is. In the SSE (non-JSON) branch of _handle_post_request, the handler registers self._sse_stream_writers[request_id] = sse_stream_writer before constructing and starting the EventSourceResponse. The only places this entry is ever removed are close_sse_stream() (user-invoked), _run_sse_writer's finally, and replay_sender's new finally. The error path right below the registration — except Exception: logger.exception("SSE response error") — closes sse_stream_writer and calls _clean_up_memory_streams(request_id) (which only touches _request_streams), but never pops _sse_stream_writers[request_id].

The code path that triggers it. _run_sse_writer is the data_sender_callable of the EventSourceResponse; it is started lazily, two start_soon hops after the response is scheduled in the task group. If the task group fails before that callable ever runs — e.g. writer.send(session_message) raising ClosedResourceError because the session is being terminated concurrently, or the ASGI send failing during response startup — _run_sse_writer's finally never fires, so nothing removes the dict entry.

Why existing code doesn't prevent it. Cleanup of _sse_stream_writers relies entirely on _run_sse_writer (or explicit close_sse_stream() calls from user code). Neither terminate() nor connect()'s teardown clears _sse_stream_writers — they only clear _request_streams. So once the except block runs without the pop, the closed-writer entry persists for the transport's lifetime.

Step-by-step proof.

  1. Client POSTs a request with id "7"; the handler reaches the SSE branch and sets self._sse_stream_writers["7"] = sse_stream_writer (line ~606).
  2. The handler enters async with anyio.create_task_group() as tg, calls tg.start_soon(response, scope, receive, send), and then await writer.send(session_message).
  3. Suppose the session is being torn down concurrently and writer.send() raises ClosedResourceError (or the ASGI send fails while the response is starting). The task group unwinds before EventSourceResponse ever invokes its data_sender_callable, so _run_sse_writer("7", ...) never runs.
  4. The except Exception block at lines ~634-637 executes: it closes sse_stream_writer and calls _clean_up_memory_streams("7"), removing _request_streams["7"] — but _sse_stream_writers["7"] is left in place, holding a closed stream.
  5. Nothing else ever pops it: terminate() and connect() teardown only clear _request_streams, and close_sse_stream("7") is only called by user/tool code for an active stream. The entry lives until the transport object is garbage-collected.

Impact. Small and bounded: one closed-writer dict entry per failed SSE response startup, an exceptional path (pragma: lax no cover). It is not a correctness wedge — but it is exactly the stale-entry class this PR just fixed in replay_sender by adding a finally that pops the writer dict, so the POST error path is now the one writer path left without symmetric cleanup. The registration-before-try and the except-without-pop existed in the same shape before this PR; the PR rewrote this region without closing the gap.

How to fix. Add self._sse_stream_writers.pop(request_id, None) to the except Exception: "SSE response error" block, alongside the existing await sse_stream_writer.aclose() and _clean_up_memory_streams(request_id). That makes the POST error path consistent with _run_sse_writer's finally and replay_sender's new finally.

…bearing

The outer except now returns a fixed message instead of embedding the
exception text in the response body, matching the convention in runner.py
and jsonrpc_dispatcher.py (detail goes to logger.exception).

Moves the _request_streams/_sse_stream_writers emptiness assertions inside
the connect() context so connect()'s own teardown doesn't make them vacuous,
and asserts the store's exception message is absent from the 500 body.
Comment on lines +641 to +650
except Exception as err:
logger.exception("Error handling POST request")
response = self._create_error_response(
f"Error handling POST request: {err}",
"Error handling POST request",
HTTPStatus.INTERNAL_SERVER_ERROR,
INTERNAL_ERROR,
)
await response(scope, receive, send)
if writer:
await writer.send(Exception(err))
return # pragma: no cover
await writer.send(Exception(err))
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Pre-existing issue (not introduced by this PR, but the PR rewrites exactly these lines and removes the coverage pragma): the outer except Exception in _handle_post_request unconditionally sends a fresh 500 and then await writer.send(Exception(err)), but it also wraps the notifications/responses branch which has already sent a 202 Accepted before its writer.send(session_message). If that send raises (e.g. ClosedResourceError from a concurrent DELETE/terminate()), the second await response(scope, receive, send) attempts another http.response.start on an already-responded connection and the resulting RuntimeError escapes handle_request as an unhandled ASGI error. Since these lines are being rewritten anyway, tracking whether a response has started (or catching Closed/BrokenResourceError around the 202-path writer.send) would close the gap.

Extended reasoning...

What the bug is. The rewritten outer error handler at the end of _handle_post_request (src/mcp/server/streamable_http.py:641-650) unconditionally builds a fresh 500 Response, calls await response(scope, receive, send), and then await writer.send(Exception(err)). However, this except wraps code paths where a response has already been sent on this connection — most concretely the notifications/responses branch, which sends a 202 Accepted (await response(scope, receive, send) for non-JSONRPCRequest messages) before doing await writer.send(session_message).

The code path that triggers it. writer here is _read_stream_writer. A concurrent DELETEterminate() (or session-shutdown teardown in connect()) closes that stream. So if a notification/response POST races termination, the 202 has already gone out on the wire, and then writer.send(session_message) raises anyio.ClosedResourceError (or BrokenResourceError if the session loop died). Control falls into the outer except Exception as err, which calls await response(scope, receive, send) a second time. On a real ASGI server (uvicorn/Starlette), starting a second http.response.start on a connection whose response is already complete raises RuntimeError. Because that RuntimeError is raised inside the except handler, it is not caught: it escapes _handle_post_request / handle_request to the ASGI server as an unhandled application error, and the trailing writer.send(Exception(err)) is never reached (it would also fail on the closed writer).

Step-by-step proof. (1) Client A POSTs a JSON-RPC notification; _handle_post_request reaches the if not isinstance(message, JSONRPCRequest) branch and sends 202 Accepted — http.response.start + body have gone out. (2) Concurrently, client B (or an idle-timeout path) issues DELETE /mcp; _handle_delete_requestterminate() closes _read_stream_writer. (3) Back in A's request task, await writer.send(session_message) raises ClosedResourceError. (4) The outer except logs, builds a 500, and calls await response(scope, receive, send) — the ASGI server raises RuntimeError("Response already started"). (5) That RuntimeError propagates out of handle_request; the manager awaits transport.handle_request directly in the per-request ASGI coroutine, so nothing upstream catches it — uvicorn logs "Exception in ASGI application" for a request that had already completed from the client's perspective.

Why existing code doesn't prevent it. The SSE branch and the JSON branch each have their own inner except blocks, so post-response failures there are swallowed before reaching the outer handler — the 202 path is the one already-responded path that funnels into this except. Nothing in the handler tracks whether a response has started.

Why this is pre-existing, and addressing the dissenting verifier. One verifier argued this is not actionable because the behavior is byte-for-byte identical to main: the old block also called await response(scope, receive, send) unconditionally (the removed if writer: guard only wrapped the writer.send, and writer is checked non-None at function entry anyway), and the trigger is a narrow teardown race whose only consequence is a server-side traceback for a request the client already got its 202 for — the session task group is unaffected and the skipped writer.send is moot since the writer being closed is what triggered the path. All of that is correct, which is why this is filed as pre-existing and non-blocking, not as a regression. The reason it is still worth a note is that this PR actively rewrites these exact lines, drops the # pragma: lax no cover, and asserts (via the new test_priming_store_failure_leaves_no_per_request_state test) that this is now a real, intentional 500 path — without guarding the case where a response has already started. Touching it now is essentially free.

How to fix. Either (a) track a response_started flag (set after each successful await response(...)) and only attempt the 500 when it is False, or (b) scope the outer except to the pre-response section / catch anyio.ClosedResourceError/BrokenResourceError separately around the 202-path writer.send and treat it as a no-op (the client already has its 202 and the session is being torn down). Either way, also skip the final writer.send(Exception(err)) when the writer is the thing that failed.

…arc via e2e

The synthetic test called _mint_priming_event itself then asserted it was
stored first, which proves the test's own statement order rather than that
_handle_post_request mints before dispatch. The e2e test in test_hosting_resume.py
already pins the real contract through the production handler; dropping
retry_interval= from that test (it doesn't reconnect) also covers the
retry_interval-is-None branch via the real path.
Comment on lines +599 to 614
# Mint the priming event before any per-request state exists:
# `EventStore.store_event` is user code and may raise, in which
# case the outer handler returns a 500 with nothing to clean up.
# Still strictly precedes dispatch, so storage order == wire order.
priming_event = await self._mint_priming_event(request_id, protocol_version)

# Store writer reference so close_sse_stream() can close it
sse_stream_writer, sse_stream_reader = anyio.create_memory_object_stream[SSEEvent](0)
self._sse_stream_writers[request_id] = sse_stream_writer
self._request_streams[request_id] = anyio.create_memory_object_stream[EventMessage](
REQUEST_STREAM_BUFFER_SIZE
)
request_stream_reader = self._request_streams[request_id][1]

async def sse_writer():
# Get the request ID from the incoming request message
try:
async with sse_stream_writer, request_stream_reader:
# Send priming event for SSE resumability
await self._maybe_send_priming_event(request_id, sse_stream_writer, protocol_version)

# Process messages from the request-specific stream
async for event_message in request_stream_reader:
# Build the event data
event_data = self._create_event_data(event_message)
await sse_stream_writer.send(event_data)

# If response, remove from pending streams and close
if isinstance(event_message.message, JSONRPCResponse | JSONRPCError):
break
except anyio.ClosedResourceError: # pragma: lax no cover
# Expected when close_sse_stream() is called
logger.debug("SSE stream closed by close_sse_stream()")
except Exception: # pragma: lax no cover
logger.exception("Error in SSE writer")
finally:
logger.debug("Closing SSE writer")
self._sse_stream_writers.pop(request_id, None)
await self._clean_up_memory_streams(request_id)

# Create and start EventSourceResponse
# SSE stream mode (original behavior)
# Set up headers
headers = {
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 With the per-request stream now buffered (REQUEST_STREAM_BUFFER_SIZE = 16) and the priming event only put on the wire by the lazily-started _run_sse_writer, a tool can emit up to 16 notifications and call ctx.close_sse_stream() before the writer ever runs — close_sse_stream() discards the writer and request stream, the priming send hits ClosedResourceError, the SSE response carries zero events, and the client (which only auto-reconnects when it saw at least one event id, src/mcp/client/streamable_http.py:429-432) never replays the stored response, so call_tool() hangs. The close-before-writer-starts race is pre-existing for tools that close after 0–1 messages, but on main the buffer-0 stream parked the router so any tool emitting ≥2 messages before close was structurally guaranteed the client held a priming cursor — this PR widens that window from ≤1 to ≤16 messages on exactly the resumability path it hardens. Consider draining/flushing on close instead of discarding, delivering the priming event eagerly from the POST handler, or having close_sse_stream() defer until the writer has started.

Extended reasoning...

What the bug is. In the SSE branch of _handle_post_request the per-request stream is now created with REQUEST_STREAM_BUFFER_SIZE = 16 and the priming event is no longer sent eagerly: it is minted in the handler (_mint_priming_event, before dispatch) but only put on the wire by _run_sse_writer — the EventSourceResponse data_sender_callable, started lazily two start_soon hops deep (the very lag #1764's reporters observe in production). During that lag, a tool using the documented polling pattern can run all the way to ctx.close_sse_stream(): close_sse_stream(request_id) (lines 219–227) pops and synchronously closes the sse_stream_writer and both halves of _request_streams[request_id]. When _run_sse_writer finally starts, its first await — sending the priming event — raises ClosedResourceError (caught and debug-logged), so the SSE response completes having carried zero events, and any notifications sitting in the 16-slot buffer are dropped from the wire.\n\nWhy the event store does not save you. close_sse_stream is only offered when an event store is configured, precisely so the client can reconnect with Last-Event-ID and replay what it missed. But the client transport only reconnects if it received at least one event id on the stream: src/mcp/client/streamable_http.py:429-432 ("Stream ended without response - reconnect if we received an event with ID", if last_event_id is not None). With zero events on the wire, last_event_id is None, no replay GET is ever issued, and the tool's eventual JSONRPCResponse — duly stored in the event store — is never delivered. The caller's call_tool() / send_request() hangs until its own timeout: a wedged request on the resumability path this PR is hardening.\n\nWhy this PR widens the window (and what part is pre-existing). On main the per-request stream had buffer 0, so the serial message_router parked on its first deposit for this request until sse_writer had started and reached its first receive() — and the old sse_writer sent the priming event before that first receive(). The session write_stream is also buffer-0, so the tool's second related send could not complete until the priming event was already handed into the response's content stream. In other words, on main any tool that emitted two or more related messages before closing had a structural guarantee that the client held a resumption cursor by close time. With the 16-slot buffer the router deposits and returns and the tool never blocks, so the unsafe window widens from "close after ≤1 message" to "close after ≤16 messages". The 0/1-message close pattern (the exact shape of tool_with_stream_close) was already exposed on main under the same writer-start lag — that part is pre-existing — but multi-notification polling tools that were previously safe by construction become unsafe with this change.\n\nStep-by-step proof. (1) Client POSTs tools/call id=7 (protocol ≥ 2025-11-25, event store configured). The handler stores the priming row, registers the 16-slot stream, start_soons the EventSourceResponse, and dispatches the request. (2) Under load, the data_sender_callable has not yet run — the production-observed lag from #1764. (3) The tool emits "progress 1" and "progress 2" (the router stores ids 4, 5 and deposits both into the 16-slot buffer without blocking), then calls await ctx.close_sse_stream() to switch the client to polling. (4) close_sse_stream closes the writer and both request-stream halves; the two buffered notifications are dropped. (5) _run_sse_writer starts, its priming send hits ClosedResourceError, and the SSE response completes with zero events. (6) The client saw no event id, does not reconnect (streamable_http.py:430), and the tool's response (stored later as id 6) is never delivered — the request hangs. On main, step (3)'s second send could not complete before the priming event was in the response pipeline, so step (6) could not occur for this sequence.\n\nOn the objection that the new window is unrealistic. It is true that on an idle in-process loop the data-sender task is ready within a couple of scheduler passes and essentially always wins the race — which is exactly why the existing tests (tool_with_stream_close, tool_with_multiple_notifications_and_close) never see this: they emit at most one notification before closing and the writer always starts first. But the PR's own motivation is that the lazily-started sse_writer lag is not reproducible on loopback yet is observed consistently in production by multiple #1764 reporters — the same premise that justifies the buffer fix is the condition under which this close-before-priming window now spans a tool's entire notification burst instead of at most one message. And while it is fair to say that on main the same stalled-writer regime was "worse" (the tool wedged at its second send and head-of-line blocked the whole session — the bug being fixed), the new failure mode is arguably harder to diagnose: the session stays healthy, the event store has everything, and exactly one request silently hangs with no resumption cursor and no error anywhere.\n\nHow to fix. Any of: (a) make close_sse_stream() drain-or-flush rather than discard — e.g. close only the send side of _request_streams[request_id] and let _run_sse_writer send the priming event plus the buffered messages before tearing down; (b) deliver the priming event eagerly from the POST handler (e.g. push it into the wire stream before dispatching the request) so a tool-initiated close can never strand the client without a cursor; or (c) have close_sse_stream() no-op / defer until the writer has actually started and sent the priming event. Option (b) also restores the invariant the PR description claims ("storage order == wire order" — and wire delivery before any tool-initiated close).

@maxisbey maxisbey merged commit a527142 into main Jun 22, 2026
34 checks passed
@maxisbey maxisbey deleted the fix/streamable-http-hol-buffer branch June 22, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in StreamableHTTP: zero-buffer memory streams cause deadlock with concurrent SSE responses

2 participants