Skip to content

Recover JSON-RPC messages with malformed unicode escapes#1722

Draft
ellismg wants to merge 1 commit into
mainfrom
ellismg/harden-jsonrpc-surrogate-decode
Draft

Recover JSON-RPC messages with malformed unicode escapes#1722
ellismg wants to merge 1 commit into
mainfrom
ellismg/harden-jsonrpc-surrogate-decode

Conversation

@ellismg

@ellismg ellismg commented Jun 18, 2026

Copy link
Copy Markdown

Summary

The Rust SDK's JSON-RPC read loop parsed each framed body with a strict serde_json::from_slice. A backend/CLI response containing a malformed \u escape — specifically a lone (unpaired) UTF-16 surrogate — fails with unexpected end of hex escape. That single parse error aborts the entire read loop (JsonRpcClient::read_loop), drains all pending requests, and tears down the transport. The list-client then reconnects and re-breaks on the same deterministic bytes, so every RPC on that channel (list_models, get_account_quota, list_global_agents, list_global_skills) fails together — on the github-app desktop app this manifests as a permanently empty model dropdown, quota, and agent/skill lists.

This is the consumer-side defensive fix for github/app#1055. Sibling producer-side fixes already landed in copilot-agent-runtime (#10300, #10231) to stop the CLI from emitting lone surrogates; this SDK change hardens the reader so a bad payload from any source (including an enterprise backend) can't brick the channel.

What changed

  • New rust/src/surrogate_safe.rssanitize_json_escapes(&[u8]) -> (Cow<[u8]>, usize), a string-context-aware byte scanner that rewrites lone high/low UTF-16 surrogates and otherwise-malformed \u escapes to \ufffd (U+FFFD). It preserves valid surrogate pairs, valid BMP escapes, and never misreads an escaped \\u. Returns Cow::Borrowed (zero allocation) when the input is already strict-JSON safe.
  • rust/src/jsonrpc.rsJsonRpcClient::read_message now recovers on the parse-error path only: on failure it runs the sanitizer; if something was actually repaired (Cow::Owned) it warn!s with the replacement count + original error and re-parses; if nothing needed repair (Cow::Borrowed) the original error propagates unchanged. The happy path stays zero-cost.
  • rust/tests/jsonrpc_test.rs — adds malformed_unicode_escape_is_recovered_and_loop_survives, a regression test that frames a notification carrying a lone high surrogate \ud83d, asserts it's delivered with the surrogate replaced by U+FFFD, and asserts a following well-formed message still arrives (proving the read loop survived).

No public API change and no consumer call-site change. read_message is the single chokepoint for every message from the CLI subprocess, so one guard here covers list-client RPCs, session-stream notifications, and resume frames. This mirrors the runtime's approach of substituting U+FFFD rather than dropping data.

Validation

Run from rust/ (matching .github/workflows/rust-sdk-tests.yml):

  • cargo +nightly-2026-04-14 fmt --all -- --config-path .rustfmt.nightly.toml --check
  • cargo clippy --all-targets --features test-support -- --no-deps -D warnings -D clippy::unwrap_used -D clippy::disallowed_macros -D clippy::await_holding_invalid_type
  • cargo test --no-default-features --features test-support — 13 surrogate_safe unit tests + the new integration test + the existing jsonrpc_test suite all pass ✅ (the e2e suite requires a locally installed CLI and is unrelated to this change.)

Fixes the SDK side of github/app#1055.

The Rust SDK's JSON-RPC read loop parsed each framed body with a strict
serde_json::from_slice. A response containing a malformed `\u` escape —
notably a lone (unpaired) UTF-16 surrogate — failed with "unexpected end
of hex escape", which aborted JsonRpcClient::read_loop, drained all
pending requests, and tore down the transport. The channel then
reconnected and re-broke on the same deterministic bytes, so every RPC on
it (list_models, get_account_quota, list_global_agents, list_global_skills)
failed together (github/app#1055).

Add a string-context-aware sanitizer (surrogate_safe::sanitize_json_escapes)
that rewrites lone surrogates and otherwise-malformed `\u` escapes to
U+FFFD while preserving valid surrogate pairs and escaped backslashes. It
runs only on the parse-error path, so well-formed traffic is unaffected
and zero-allocation (Cow::Borrowed) when nothing needs repair. On
recovery the reader warns with the replacement count and re-parses;
unrelated syntax errors still propagate unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant