Skip to content

feat(qwp): connect timeout, ingest callbacks, and lazy_connect (tolerant startup) on the QuestDB facade#60

Open
bluestreak01 wants to merge 53 commits into
mainfrom
feat/connect-timeout
Open

feat(qwp): connect timeout, ingest callbacks, and lazy_connect (tolerant startup) on the QuestDB facade#60
bluestreak01 wants to merge 53 commits into
mainfrom
feat/connect-timeout

Conversation

@bluestreak01

@bluestreak01 bluestreak01 commented Jun 28, 2026

Copy link
Copy Markdown
Member

tandem OSS PR: questdb/questdb#7341

Summary

Related ergonomics/resilience improvements for the QWP (WebSocket) client:

  1. Application-level TCP connect timeout (transport-wide) — bound the connect itself instead of riding the OS-level timeout.
  2. Expose the ingest callbacks on the QuestDB facadeerrorHandler / connectionListener, previously unreachable from the pooled facade.
  3. Tolerant startup via lazy_connect=true — start the handle even when the server is down, buffer writes meanwhile, and read once it's up. Reads stay fully enabled.
  4. Single cluster config on the facade — one ws/wss string (a single addr server list) configures the whole cluster, driving both the ingest and query pools.
  5. Pooled-lease refactor + correctness fixes — reads and writes share one pooled-lease model (borrowQuery() mirrors borrowSender()), and a per-borrow generation closes a family of pool-corruption bugs (double-close, use-after-close, lost dispatch, stale cancel). This is internal hardening — see §5.

Items 1–3 are independently usable and off by default.


1. Configurable TCP connect timeout

A connect to a black-holed/firewalled host blocks on the OS-level TCP connect timeout (60–120s): the socket is created blocking, connect() runs, then it's switched to non-blocking. The code calls this out:

// SenderPool.java
// connect to a black-holed/firewalled host blocks on the OS connect timeout
// (the transport exposes no application-level connect timeout to clamp it).

Approach (native, cross-platform): non-blocking connect() (EINPROGRESS) → poll/select for writability bounded by the caller's budget → confirm via getsockopt(SO_ERROR). A sentinel (CONNECT_TIMEOUT = -3) lets Java raise a timeout-flagged exception. Generalises the existing handleEintrInConnect helper.

Sender.builder("https::addr=host:9000;connect_timeout=5000;")...   // or .connectTimeoutMillis(5000)
QwpQueryClient.fromConfig("ws::addr=host:9000;connect_timeout=5000;");

Touches: native share/net.c + windows/net.c + net.h; Net / NetworkFacade(Impl); HttpClientConfiguration.getConnectTimeout(); HttpClient.connect() / WebSocketClient.doConnect(); ConfigSchema COMMON key connect_timeout; Sender builder + both parsers; QwpWebSocketSender / QwpQueryClient (withConnectTimeout). Bounds the TCP connect and the TLS handshake (see below); the WebSocket upgrade stays under the request/auth timeout (auth_timeout_ms).

1b. Bound the TLS handshake (and stop it busy-spinning)

JavaTlsClientSocket.startTlsSession ran the TLS handshake with raw recv/send on the now-non-blocking socket and never waited on readiness. A peer that completed the TCP connect but stalled before its half of the handshake left the engine in NEED_UNWRAP with recv returning 0 (would-block), so the loop re-read in a tight cycle: a 100% CPU busy-spin with no deadline. connect_timeout bounded only the TCP connect, and the upgrade's request/auth timeout never covered the handshake, so a stalled wss:// host (e.g. QuestDB Cloud) could pin a core indefinitely and defeat the bounded connect. The busy-spin pre-dates this PR (it predates the connect_timeout work), but the same change closes it.

The handshake now runs through the client's existing deadline-aware ioWait (the same primitive recvOrDie/doSend use): when the socket would block it parks on epoll/kqueue/select for the remaining connect budget and throws a timeout-flagged exception once it is spent. doConnect registers the fd with the event loop before the handshake and bounds it by connect_timeout, falling back to the request timeout when connect_timeout is unset — so the handshake can no longer hang or spin even with the default config. Both HttpClient (https) and WebSocketClient (wss) share the fix, and the connect TLS block disconnects on any handshake error so the fd/native buffers do not leak.

Tradeoff: with connect_timeout unset, a wss:// handshake that stalls now fails after the request timeout instead of spinning/hanging forever. That is strictly better, but it is a behavior change for that edge.

Touches: JavaTlsClientSocket (handshake extracted into runHandshake), new SocketReadinessWaiter, Socket/PlainSocket, HttpClient.connect() / WebSocketClient.doConnect(). Tests: JavaTlsClientSocketTest.testHandshakeWaitsForReadabilityInsteadOfBusySpinning (a stalled peer must yield to the readiness waiter, not spin; a method-level timeout fails the test if the spin returns) and testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress (happy path).


2. Ingest callbacks on the QuestDB facade

The facade built ingest senders from config strings only (SenderPool → Sender.fromConfig), so the programmatic SenderErrorHandler / SenderConnectionListener were unreachable — a facade user got the default loud-not-silent handlers with no way to observe async ingest errors or connection transitions.

QuestDB.builder()
    .fromConfig("ws::addr=host:9000;")
    .errorHandler(myErrorHandler)
    .connectionListener(myConnectionListener)
    .build();

QuestDBImpl / SenderPool each gain a full constructor carrying the callbacks; the white-box test-seam constructors are preserved as delegating shims. SenderPool.applyUserCallbacks() applies them to every pooled sender (non-SF and SF paths); internal recovery delegates are excluded. Defaults null.


3. Tolerant startup: lazy_connect=true

The facade prewarms a reader (QueryClientPool) synchronously and fail-fast (default query_pool_min=1; queries have no async connect), so a down server failed the whole build. The fix is a single connect-string flag that makes the handle tolerate a down server without giving up reads — "starts when the server is down" and "never reads" are different things, and you almost always want the first.

lazy_connect=true:

  • a) starts even when the server is down — the ingest side goes async and the read pool defaults to query_pool_min=0, so neither side fail-fasts and build() returns promptly;
  • b) buffers writes while down — the async sender accepts rows that flush once the wire is up;
  • c) reads once the server is up — the read pool stays enabled (it is not disabled the way a write-only mode would). It defaults to query_pool_min=0, so nothing connects eagerly and borrowQuery() connects lazily on first use. While the server is still down a borrowQuery() throws (it has nowhere to connect — exactly like any read against a down server); once the server is up it connects and reads. The point is that lazy_connect defers the read connect rather than refusing reads.
// starts with no server present; writes buffer, reads work once it's up
try (QuestDB db = QuestDB.connect("ws::addr=host:9000;lazy_connect=true;")) {
    try (Sender s = db.borrowSender()) {
        s.table("t").longColumn("v", 1).atNow();    // buffers while the server is down
    }
    // ... later, once the server is up:
    try (Query q = db.borrowQuery()) {               // read pool connects lazily on first borrow
        q.sql("select 1").handler(handler).submit().await();
    }
}

Because both sides must start non-blocking, a knob that forces a blocking / fail-fast startup is a configuration conflict, rejected up front with a clear remedy rather than silently overridden:

  • initial_connect_retry other than async (i.e. off/false/on/true/sync), and
  • an explicit query_pool_min > 0 (connect string or builder call).

lazy_connect is a Side.POOL registry key — the two ws clients ignore it; the facade reads it, defaults query_pool_min to 0, and injects initial_connect_retry=async when the user set none.


4. Single cluster config on the facade

A QuestDB cluster is one logical target reached over QWP for both ingest and query, so the facade takes one cluster config: a single ws/wss string that lists every node in one addr server list and drives both the sender and query pools.

QuestDB.builder()
    .fromConfig("ws::addr=node1:9000,node2:9000,node3:9000;")  // whole cluster
    .errorHandler(myErrorHandler)
    .connectionListener(myConnectionListener)
    .build();

build() validates the one string with both the ingest (validateWsConfigString) and egress (QwpQueryClient.validateConfig) validators; each side applies the keys it owns and ignores the rest. Pool keys are read from a single ConfigView, and QuestDBImpl passes the one config to both pool slots (preserving the white-box reflection seam).


5. Pooled-handle lifecycle: lease/generation refactor + correctness fixes

This is the bulk of the diff and it is internal (no user-facing API beyond the borrow methods). It reshapes the facade so reads and writes share one pooled-lease model and fixes a family of pool-corruption bugs that the previous thread-affine / flag-guarded design allowed.

Refactor — symmetric pooled lease (1245c17). Reads and writes now use one model:

  • Ingest: remove QuestDB.sender() / releaseSender() and the entire thread-pin subsystem behind them (pinToCurrentThread, releaseCurrentThread, clearPinIfCurrent, the threadAffine ThreadLocal, the PooledSender invalidated flag). borrowSender() is now the only way to lease a Sender.
  • Egress: remove query(), newQuery(), and executeSql(); add QuestDB.borrowQuery(), a closeable, non-allocating Query lease that mirrors borrowSender(). Each pooled QueryWorker owns one pre-allocated QueryImpl, handed out reset on borrow; submit() dispatches on the held worker (single-flight) and close() returns it to the pool. Reads now connect at borrow time, so under lazy_connect the read pool defaults to query_pool_min=0 and build() does not fail-fast while the server is down. (This is why §3's read example uses borrowQuery() and not the removed executeSql.)

Correctness fixes carried on top of the refactor:

  • f39e846 (M1) — stale pooled handle can corrupt the pool under double-close / use-after-close. The pooled handle was the reused per-slot object guarded by a non-volatile in-use flag, so a stale handle's close()/cancel()/write could leak into a different borrow (double-release → two concurrent borrowers on one non-thread-safe client). Fix: every borrow gets its own immutable generation, stamped under the pool lock and re-checked on every op — close()/cancel() no-op on a stale generation (idempotency preserved), submit()/writes throw, and release re-checks the generation under the lock so a slot can never be enqueued twice. Introduces QueryLease (wrapping QueryImpl) and SenderSlot (with PooledSender as the per-borrow wrapper).
  • e30a59c — lost query-worker dispatch under single-flight reuse. runLoop() cleared current outside signalLock, so a fast submit() → await() → submit() loop could clobber a freshly dispatched job and discard its signal, hanging the caller forever. Fix: clear current under signalLock at the moment of consumption. (Surfaced as a 60s hang in testSustainedMixedConcurrency, mostly on aarch64 CI; 15/15 clean with the fix.)
  • bcb1e7a — watchdog cancel must re-check the lease generation under the pool lock. cancel(gen) validated the generation with an unlocked volatile read then issued the wire cancel separately — a TOCTOU where a watchdog could abort a new borrower's query with a spurious STATUS_CANCELLED. Fix: route through QueryClientPool.cancelIfCurrent(worker, gen), which re-checks and cancels under the same lock the generation is bumped under; close()'s abort-on-unawaited-close uses the same path.
  • f64567e — assert the lease wraps the same pooled QueryImpl. Pins the invariant that the freshly allocated per-borrow wrapper delegates back to the one pre-allocated, reused QueryImpl (so a regression that allocated per borrow would be caught).

Testing

  • NetConnectTimeoutTest — loopback success, refused-vs-timeout disambiguation, black-hole timeout within budget.
  • QuestDBFacadeCallbacksTest — facade-wired errorHandler receives the async budget-exhaustion SenderError; connectionListener observes connection events (no server needed).
  • QuestDBLazyConnectTestlazy_connect=true starts + buffers a write with the server down, keeps the read pool enabled (borrowQuery() deferred to first use, not disabled), and rejects both conflicts (blocking initial_connect_retry, explicit query_pool_min > 0) from the connect string and from builder calls; QuestDBServerRecoveryTest dogfoods lazy_connect=true for the full down → write → up → read lifecycle.
  • QueryLeaseGenerationTest / SenderLeaseGenerationTest — the per-borrow generation guard: double-release, cross-borrow cancel/write, and the concurrent stale-cancel-after-reborrow TOCTOU (testConcurrentCancelDoesNotReachClientAfterReborrow).
  • QuestDBServerRecoveryTest — full lifecycle: server down → facade starts → client writes (buffered) → server starts → write side reconnects and the reader connects on the first borrowQuery().
  • QuestDBBuilderTest — covers the single cluster-config surface; the shared-vocabulary and sender-pool-unwind tests run against one server with one config.
  • TestWebSocketServer now serves both pools from one config like a real node: SERVER_INFO is emitted only on the egress /read path (the ingest /write ACK stream would choke on it), plus a setRejectReadUpgrade() toggle to fail only the query upgrade.

Full impl + network + facade suites pass locally on JDK 8 (source of truth) and the surface compiles on JDK 25 (java11+ front).

CI / native

  • ci(native): the rebuild_native_libs.yml linux-x86-64 job moved from manylinux2014 (glibc 2.17) to manylinux_2_28 (glibc 2.28), mirroring linux-aarch64 — GitHub now forces actions onto Node 24 (glibc ≥ 2.27), which couldn't run in the 2.17 container (pre-existing breakage, unrelated to the C change).
  • Native libraries are built on the test/release runners (no longer committed).

Compatibility

The connect_timeout knob and the ingest callbacks are additive and off by default. The QuestDB facade is new in this PR; the legacy direct Sender and QwpQueryClient APIs are unaffected.

Note: the PR bundles several independent features. Happy to split any of them into standalone PRs off main if that's easier to review.

bluestreak01 and others added 5 commits June 28, 2026 21:41
Establish a real, cross-platform connect timeout for the HTTP and
WebSocket (QWP) transports. Previously a connect to a black-holed or
firewalled host blocked on the OS-level TCP connect timeout (often
60-120s) because the socket was created blocking and only switched to
non-blocking *after* connect; the transport exposed no knob to clamp it.

Approach: a new native primitive switches the socket to non-blocking
*before* connect, so connect() returns EINPROGRESS immediately, then
polls for writability bounded by the caller's budget and confirms the
outcome via SO_ERROR. A distinct return code (CONNECT_TIMEOUT, -3) lets
the Java layer raise a timeout-flagged exception rather than decode errno.

Native:
- share/net.c: connectAddrInfoTimeout + awaitConnectComplete (poll +
  getsockopt(SO_ERROR), monotonic-clock EINTR handling)
- windows/net.c: Winsock equivalent (select write/except sets)
- share/net.h: ECONNTIMEOUT (-3) sentinel

Java:
- Net / NetworkFacade(Impl): connectAddrInfoTimeout + CONNECT_TIMEOUT
- HttpClientConfiguration.getConnectTimeout() (default 0 = OS fallback)
- HttpClient.connect() / WebSocketClient.doConnect() honor it and throw a
  timeout-flagged HttpClientException on CONNECT_TIMEOUT
- Sender builder: connectTimeoutMillis() + connect_timeout connect-string
  key (legacy http and ws/wss parsers) + ConfigSchema COMMON key
- QwpWebSocketSender / QwpQueryClient: thread the value through to their
  WebSocketClient (adds QwpQueryClient.withConnectTimeout)

Default is unset (0): behaviour is unchanged unless connect_timeout is
configured.

Tests: NetConnectTimeoutTest covers loopback success, refused-vs-timeout
disambiguation, and a black-hole timeout that fires within budget;
config-honored drift guards updated for the new COMMON key.
On a runner with no route to TEST-NET-1 (192.0.2.0/24) connect() fails
fast with ENETUNREACH instead of dropping the SYN, so the timeout path
can't be exercised. Skip (Assume) in that case rather than asserting a
timeout, while still proving the call never blocked on the OS connect
timeout.
GitHub now forces actions onto Node 24 (glibc >= 2.27), which cannot run
inside the manylinux2014 (glibc 2.17) container the linux-x86-64 native
build used; actions/checkout failed before compilation. The old
Node-20-glibc-217 override only patched /__e/node20, not /__e/node24.

Switch the job to quay.io/pypa/manylinux_2_28_x86_64 (glibc 2.28, runs
stock Node 24) and drop the Node hack, nasm src.rpm rebuild, and manual
CMake download, mirroring the linux-aarch64 job that already builds on
manylinux_2_28.
The pooled QuestDB facade built its ingest Senders from config strings
only (SenderPool -> Sender.fromConfig), so the programmatic ingest
callbacks -- SenderErrorHandler and SenderConnectionListener -- were
unreachable: a facade user got the default loud-not-silent handlers with
no way to observe async ingest errors or connection transitions.

Expose both as QuestDBBuilder setters and thread them to every pooled
Sender:
- QuestDBBuilder.errorHandler(...) / .connectionListener(...)
- QuestDBImpl gains a full constructor carrying the callbacks; the public
  constructor forwards them and the 12-arg white-box test-seam constructor
  is preserved as a delegating shim (null callbacks).
- SenderPool gains a full constructor + applyUserCallbacks() that applies
  the callbacks to every sender it builds (both the non-SF and SF paths);
  the 8-arg test-seam constructor is preserved as a shim.

Recovery delegates (internal, short-lived, OFF-mode drain senders) are
deliberately excluded so the user's callbacks never see events from
internal machinery.

Defaults are null -> behaviour is unchanged unless a callback is set.

Tests: QuestDBFacadeCallbacksTest prewarms one ingest sender at a dead
port in async mode with a tight reconnect budget and asserts the
facade-wired errorHandler receives the budget-exhaustion SenderError and
the facade-wired connectionListener observes connection events -- no
server required.
@bluestreak01 bluestreak01 changed the title feat(net): add application-level TCP connect timeout feat: TCP connect timeout + expose ingest callbacks on the QuestDB facade Jun 28, 2026
The QuestDB facade always built a reader (QueryClientPool), which prewarms
synchronously and fail-fast (default query_pool_min=1, QwpQueryClient has no
async connect). So a down server / read primary sank the whole facade build,
taking the write side with it.

Add QuestDBBuilder.writeOnly(): build an ingest-only handle that never
constructs the query pool, so the read side cannot fail startup. A query
config is no longer required in this mode (any query config set is ignored),
and query()/newQuery() throw a clear "write-only" IllegalStateException.

- QuestDBImpl gains a write-only public constructor + a writeOnly flag on the
  full constructor; the 12-arg white-box test-seam constructor stays unchanged
  (delegates with writeOnly=false). queryPool/queryThreadLocal are null in
  write-only mode.
- PoolHousekeeper tolerates a null query pool.
- QuestDBBuilder.buildWriteOnly() validates + resolves only the sender/shared
  pool knobs from the ingest config.

Pair with initial_connect_retry=async (or sender_pool_min=0) on the ingest
config so the write side does not fail-fast either -> the facade starts with
no server present.

Tests: QuestDBWriteOnlyTest proves the facade builds with no server, that
query()/newQuery() are disabled, that no query config is required, and that an
async warm sender can buffer a write while serverless.
@bluestreak01 bluestreak01 changed the title feat: TCP connect timeout + expose ingest callbacks on the QuestDB facade feat: connect timeout, ingest callbacks, and write-only mode on the QuestDB facade Jun 28, 2026
…nnects

End-to-end resilience test for the QuestDB facade: build with the server
down (ingest initial_connect_retry=async + query_pool_min=0), buffer a
write, then bring the server up and assert the write side reconnects and
the previously-deferred reader connects on the first query.

Uses two TestWebSocketServers bound-but-not-accepting to model a reachable
-but-down server (handshakeCount stays 0 until start()). The mock cannot
serve real SELECT rows, so the read step asserts the query client connects
once the server is up, not the row contents. Stable across repeated runs.
@bluestreak01 bluestreak01 changed the title feat: connect timeout, ingest callbacks, and write-only mode on the QuestDB facade feat(qwp): connect timeout, ingest callbacks, and write-only mode on the QuestDB facade Jun 28, 2026
bluestreak01 and others added 17 commits June 29, 2026 00:20
Remove the committed Linux/Windows native binaries (libquestdb.so,
libquestdb.dll) and compile them locally during the Azure test CI.

- New ci/build_native.yaml template compiles libquestdb on the runner:
  Linux (cmake+nasm+build-essential) and Windows (MinGW-w64+NASM via choco).
  macOS keeps using the committed .dylib. Inits the zstd submodule first.
- Output is copied into src/main/resources/.../bin/<platform>/ so mvn install
  packages it into the client jar for both client and OSS server tests; the
  loader also picks up the CMake bin-local output directly.
- Wired the template into run_tests_pipeline.yaml before client install.

Committed binaries are still produced by the release GitHub Action.
Remove the committed darwin-aarch64/darwin-x86-64 libquestdb.dylib and build
them on the macOS runners, matching the Linux/Windows approach. No native
binaries remain committed; all are compiled during the test CI.

- build_native.yaml: add a macOS build step (brew cmake/nasm,
  MACOSX_DEPLOYMENT_TARGET=13.0), detect darwin-aarch64 vs darwin-x86-64 via
  uname -m, and copy the dylib into src/main/resources/.../bin/<platform>/.
- Init the zstd submodule on all platforms (it was skipped on Darwin).

Release artifacts are still produced by the release GitHub Action.
The macos-15 (x64) agent hardware no longer exists, so remove the mac-x64
matrix entry. macOS is now tested on mac-aarch64 only. The darwin-x86-64
.dylib is still produced by the release GitHub Action, and build_native.yaml
keeps its uname-based arch detection so an x64 macOS runner would still build
correctly if ever reintroduced.
The GitHub Actions build-jdk8 job ran the full test suite against the
committed native libraries, which are now removed. Without the .so the
io.questdb.client.std.{Os,Files,Unsafe,...} static initializers fail with
NoClassDefFound (1289 errors).

Compile the native .so from source first (zstd submodule + cmake/nasm/
build-essential), against the JDK 8 JNI headers, and copy it into
src/main/resources/.../bin/linux-x86-64 so it survives 'mvn clean' and loads
via the production bin/<platform> path. Update the now-stale comment.
glibc 2.17 moved clock_gettime() into libc under a new GLIBC_2.17 version
node. Building the release .so in a modern container (manylinux_2_28) binds
clock_gettime@GLIBC_2.17, which raises the whole library's glibc floor to 2.17
and breaks loading on glibc 2.14-2.16 hosts.

Add src/main/c/share/glibc_compat.h with a .symver directive forcing the
reference back to clock_gettime@GLIBC_2.2.5 (x86-64 glibc only; no-op on
aarch64/macOS/Windows), include it from net.c and os.c, list it in the
CMake sources, and document the glibc floor in rebuild_native_libs.yml.
The Coverage Report job runs 'mvn -P jacoco test' on core but had no native
build step, so after dropping the committed binaries it failed to load
libquestdb.so (NoClassDefFound in io.questdb.client.std.*). Add the
build_native.yaml template before the coverage test run, matching the
BuildAndTest job. The job runs on Linux, so it compiles libquestdb.so.
Collapse the dual ingest/query config surface on the QuestDB facade into a
single configuration string for the whole cluster. A QuestDB cluster is one
logical target reached over QWP for both ingest and query, so one ws/wss
string -- listing every node in a single `addr` server list -- now drives
both the sender and query pools.

- QuestDBBuilder: drop ingestConfig()/queryConfig(); fromConfig() sets the
  one cluster config. Remove the cross-side pool-key conflict resolution
  (no two strings to reconcile) -- resolvePoolInt/Long read one ConfigView.
  build() validates the single string with both the ingest and egress
  validators; each side applies the keys it owns and ignores the rest.
- QuestDB: remove the connect(ingest, query) overload; connect(config) and
  builder() now document the one-config/server-list model.
- QuestDBImpl is unchanged: the builder passes the one config to both pool
  slots, preserving the white-box reflection seam.

Tests: TestWebSocketServer now serves both pools from one config like a real
node -- SERVER_INFO is emitted only on the egress /read path (the ingest
/write ACK stream would choke on it), plus a setRejectReadUpgrade() toggle to
fail just the query upgrade. Rewrote QuestDBBuilderTest and updated the
facade callback/recovery/write-only tests and the examples accordingly.
…n-string key

Make writeOnly() deliver its own promise and reach it from the connect string.

Previously "start even when the server is down" needed two knobs that look
unrelated: writeOnly() (skip the fail-fast read pool) plus
initial_connect_retry=async (keep the write prewarm from fail-fast-ing). The
former governs the read side, the latter the write side, so writeOnly() alone
still hard-failed build() when sender_pool_min >= 1 and the server was down.

- writeOnly() now defaults the ingest side to a non-blocking async initial
  connect (injected right after the schema so an explicit initial_connect_retry
  in the user's string still wins, last-write-wins). build() returns promptly
  with the server down and the sender pool warm; writes buffer until the wire
  comes up.
- New POOL-side connect-string key write_only=on, equivalent to .writeOnly(),
  so the mode is reachable from any config string (and QuestDB.connect). The
  two ws clients ignore it; the facade routes on it.

Tests: writeOnly() with sender_pool_min defaulting to 1 and no
initial_connect_retry now builds without fail-fast; write_only=on routes to
the ingest-only path via builder and via connect(). PoolConfigHonoredTest's
drift guard skips the routing flag (not a numeric sizing knob).
…nabled

Strengthen the server-recovery test to assert what the write-only mode is NOT:
on a normal facade built while the server is down (lazy read pool via
query_pool_min=0, async ingest), query() must still hand back a usable builder
*before* the server is up -- reads are enabled, just deferred -- and the
deferred reader connects on the first submit once the server comes up. This is
the read-capable counterpart to write-only, where query() throws for the life
of the handle.
…ant startup)

Drop write-only mode (it permanently disabled reads -- query()/newQuery()
threw for the life of the handle) in favour of a read-capable tolerant-startup
flag, lazy_connect, reachable from the connect string.

lazy_connect=true:
- a) starts even when the server is down -- the ingest side connects async and
     the read pool defaults to query_pool_min=0, so neither side fail-fasts;
- b) buffers writes while the server is down (async sender);
- c) reads once the server is up -- the read pool stays ENABLED and connects
     lazily on the first query.

Because both sides must start non-blocking, a knob that forces a blocking /
fail-fast startup is a configuration conflict, rejected up front with a clear
remedy:
- initial_connect_retry other than async (off/false/on/true/sync), and
- an explicit query_pool_min > 0 (connect string or builder call).

Changes:
- ConfigSchema: write_only -> lazy_connect (Side.POOL; both clients ignore it).
- ConfigView.getBool: accept true/false (and on/off).
- QuestDBBuilder: remove writeOnly()/buildWriteOnly(); build() resolves
  lazy_connect, validates the two conflicts, defaults query_pool_min to 0 and
  injects initial_connect_retry=async when unset.
- QuestDBImpl: remove the write-only constructor/flag and requireQueryEnabled;
  the query pool is always built (the white-box reflection seam is unchanged).
- Tests: QuestDBWriteOnlyTest -> QuestDBLazyConnectTest (start+write while down,
  reads stay enabled, both conflicts via string and builder, true/on parsing);
  QuestDBServerRecoveryTest now dogfoods lazy_connect=true for the full
  down->write->up->read lifecycle; PoolConfigHonoredTest drift guard skips the
  flag.
…e timeout

QwpQueryClient.runUpgradeWithTimeout wrapped connect() and upgrade() in one
try block, so a connect_timeout overage -- the timeout-flagged
HttpClientException from doConnect()'s CONNECT_TIMEOUT path -- was caught by
the isTimeout() branch meant for upgrade() and rewritten as
"WebSocket upgrade to host:port exceeded auth_timeout=<authTimeoutMs>ms".
A user with connect_timeout=500 and auth_timeout_ms=15000 saw, after ~500ms,
an error blaming a 15000ms auth timeout (wrong phase and wrong value).

Move connect() outside the upgrade try so the auth_timeout rewrite only
applies to genuine upgrade-phase timeouts; connect-phase failures propagate
with their own "connect timed out ..." message. The failover walk is
unchanged (the exception is still a transport error and the next endpoint is
tried). The ingest side (QwpWebSocketSender) was already correct -- it routes
through QwpUpgradeFailures.classify, which leaves the connect-timeout
exception unmodified.

Add QwpQueryClientConnectTimeoutTest: a TEST-NET-1 blackhole connect with
connect_timeout < auth_timeout must report connect_timeout, not auth_timeout.
It skips gracefully when the runner has no route to the blackhole, mirroring
NetConnectTimeoutTest. Verified it fails on the pre-fix code with the exact
misreported message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nly removal

The PoolHousekeeper reap loop wrapped queryPool.reapIdle() in an
`if (queryPool != null)` guard whose comment ("null for a write-only
handle") described write-only mode. That mode was removed in the
lazy_connect change (7491d95): QuestDBImpl now builds the query pool
unconditionally and is the sole PoolHousekeeper caller, so the field is
never null in a live handle. The null branch is unreachable and the
comment is stale -- drop both. The outer best-effort Throwable catch
stays; it has nothing to do with write-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ConfigView.getBool accepts true/false and on/off, but its invalid-value
error read "(expected true, false)", under-reporting the accepted forms
(e.g. lazy_connect=on is valid yet the message implies otherwise). List
all four, matching getBoolOnOff's convention of naming exactly what it
accepts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reshape the QuestDB facade so reads and writes share one pooled-lease
model, and remove the thread-affine footguns on both sides.

Ingest:
- Remove QuestDB.sender() and releaseSender(), along with the entire
  thread-pin subsystem behind them (SenderPool.pinToCurrentThread,
  releaseCurrentThread, clearPinIfCurrent, the threadAffine ThreadLocal,
  and the PooledSender invalidated flag that existed only to make pinning
  safe). borrowSender() is now the only way to lease a Sender.

Egress:
- Add QuestDB.borrowQuery(), a closeable, non-allocating Query lease that
  mirrors borrowSender(). Each pooled QueryWorker owns one pre-allocated
  QueryImpl, handed out reset on borrow; submit() dispatches on the held
  worker (single-flight) and close() returns it to the pool. The worker
  no longer auto-releases per query.
- Remove query(), newQuery(), and executeSql(). Reads now connect at
  borrow time rather than submit time; under lazy_connect the read pool
  still defaults to min=0, so build() does not fail-fast while the server
  is down.

Test seams:
- Make the white-box seam constructors public and annotate @testonly
  where production never calls them (QuestDBImpl, SenderPool). The
  QueryClientPool connectHook ctor stays public without @testonly because
  QuestDBImpl constructs the query pool through it. Tests now call these
  constructors directly instead of via reflection.

Update the client tests, the usage example, and the startup/failover
design doc to the new API.
try (server) on an effectively-final existing variable is Java 9+ syntax
(JEP 213) and fails the JDK 8 test-compile (the source-of-truth target)
with -source 1.8, breaking the build-jdk8 CI job and the release build
before any test runs. Inline the resource construction into the
try-with-resources declaration, which is valid on Java 8 and keeps the
server variable name used throughout the body.
…se-after-close

A pooled Query/Sender handle was the reused per-slot object itself, guarded
only by a non-volatile in-use/borrowed flag. Once a worker/slot was released
and re-borrowed, that flag flips back to "live", so a stale handle's
close()/cancel()/write would leak into a *different* borrow: a duplicate close
double-released the worker/slot (enqueued twice -> two concurrent borrowers on
one non-thread-safe client/delegate), and a cached Completion.cancel() or stale
write hit whatever borrow now owns it. Idempotent close() and no-op cancel()
are documented contracts, so this was reachable from contract-legal code, not
just misuse, with pool-wide blast radius and no -ea guard.

Fix: give every borrow its own immutable generation, stamped under the pool
lock when the worker/slot is handed out and bumped again when it is returned.
The reused state stays on the slot; callers get a thin per-borrow handle that
carries the generation and validates it on every operation:
  - close()/cancel() are no-ops on a stale generation (idempotency preserved),
  - submit()/data writes throw,
  - release/giveBack/discardBroken re-check the generation under the pool lock
    so a worker/slot can never be enqueued twice, plus an -ea assert that it is
    not already in the available deque.

Egress: QueryImpl stops being the user-facing Query; new QueryLease wraps it.
Ingest: new SenderSlot is the reused slot; PooledSender becomes the per-borrow
wrapper (keeps the public name, so borrow() still returns it). The per-submit
path stays allocation-free; only the small lease handle is created per borrow
(routinely scalar-replaced under try-with-resources).

Adds QueryLeaseGenerationTest and SenderLeaseGenerationTest covering the
double-release and cross-borrow cancel/write paths; updates the white-box
tests to the new shapes. Full core suite green under -ea (the lone failure is
the unrelated pre-existing FilesTest M2, which fails identically on master).
QueryWorker.runLoop() consumed the dispatch hand-off (q = current) under
signalLock but cleared the slot (current = null) only after runOn()
returned, outside the lock. A Query lease is single-flight but reused:
the user thread loops submit() -> await() on the same handle. The
terminal callback inside runOn() wakes the user thread, which can call
submit() -> dispatch() -- setting current = q and signalling -- before
the worker thread reaches its post-run finally block. That stale
current = null then clobbered the freshly dispatched job and discarded
its already-consumed signal, so the worker parked forever on the
condition while the user thread blocked on a Completion that never
fired. The borrowed worker never returned to the pool and the caller
hung indefinitely.

Clear current under signalLock at the moment of consumption and drop the
post-run finally clear. dispatch() now cannot be clobbered: by the time
the next dispatch runs, the worker is either already awaiting (so the
signal wakes it) or will observe current != null on the while check and
skip awaiting. The exception path leaves current already null, and the
shutdown branch still clears under the lock.

Surfaced as a 60s hang in QuestDBFacadeE2ETest.testSustainedMixedConcurrency
(more threads than pool slots, repeated submit/await per lease). Was
intermittent and timing-sensitive, so it showed up mainly on aarch64 CI;
reproduced locally on x86 about one run in four, and 15/15 clean with
this fix.
JavaTlsClientSocket.startTlsSession ran the TLS handshake with raw
delegate.recv/delegate.send on a non-blocking socket and never waited on
socket readiness. A peer that completed the TCP connect but stalled
before its half of the handshake left the engine in NEED_UNWRAP with
recv returning 0 (would-block), so the loop re-read in a tight cycle: a
100% CPU busy-spin with no deadline. connect_timeout bounded only the TCP
connect, and the WebSocket upgrade's auth/request timeout never covered
the handshake, so a stalled wss:// (e.g. QuestDB Cloud) handshake could
pin a core indefinitely and defeat a bounded connect.

Drive the handshake through the client's existing deadline-aware ioWait,
the same primitive recvOrDie/doSend already use. When the socket would
block, the handshake hands control to a SocketReadinessWaiter that parks
on epoll/kqueue/select for the remaining connect budget and throws a
timeout-flagged exception once it is spent. This removes the spin and
bounds the handshake in one change: both NEED_UNWRAP (recv == 0) and the
NEED_WRAP send loop (send == 0) now wait instead of spinning.

doConnect now calls setupIoWait() before the handshake (so the fd is
registered when the waiter parks) and bounds the handshake by
connect_timeout, falling back to the request timeout when connect_timeout
is unset, so the handshake can no longer hang or spin even with the
default config. The connect TLS block disconnects on any handshake error
(including the waiter's timeout) so the fd and native buffers do not leak.
Both HttpClient (https) and WebSocketClient (wss) connect paths share the
fix.

Extracted the handshake loop into runHandshake(waiter) so a stub
SSLEngine can exercise the wait paths. Added two tests:
testHandshakeWaitsForReadabilityInsteadOfBusySpinning drives a stalled
peer and asserts the handshake yields to the waiter exactly once (a
method-level timeout fails the test if the spin returns), and
testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress guards the
happy path.
awaitConnectComplete reset its time baseline (start = now) on every EINTR
and subtracted only whole milliseconds of elapsed time. Under a
high-frequency signal storm on the connecting thread -- e.g. a wall-clock
profiler or interval timer interrupting the blocked poll() more than once
per millisecond -- each interval truncated to 0 ms, so the budget never
decremented and poll() was re-armed with the full timeout every
iteration. The connect timeout could then extend well past its bound, or
never fire at all, contradicting the comment that EINTR storms cannot
extend it. Even at lower rates each interrupt discarded up to ~1 ms of
accounting, drifting the timeout one-directionally.

Compute one absolute monotonic deadline up front and derive the remaining
poll() budget from it each iteration. The remaining time can only
decrease, so the timeout is a strict upper bound regardless of interrupt
frequency; truncation now only under-shoots the final poll by < 1 ms,
which never extends the wait. The success, refused, and timeout paths are
otherwise unchanged.

Validated: NetConnectTimeoutTest (loopback success, refused-vs-timeout,
black-hole timeout within budget) passes against the rebuilt library. A
standalone harness driving the old vs new logic under a simulated 2.5 kHz
EINTR storm confirms the old logic runs unbounded (aborted at >10x the
budget) while the new logic returns at the budget. A deterministic
regression test at the JNI layer is not practical: Java cannot target a
POSIX signal at the specific connecting thread, and the only hanging-
connect fixture available (TEST-NET-1 black-hole) is routing-dependent
and already Assume-skipped on most runners.

Rebuilds the committed linux-x86-64 libquestdb.so from the net.c change
(mirrors the copy step in ci.yml); other platforms are refreshed by the
rebuild-native-libs workflow.
The build_native.yaml step comment claimed "macOS uses the committed
.dylib and is skipped inside the template." Both halves are now false:
this branch dropped all committed dylibs and added a macOS build step to
build_native.yaml (an active Darwin-conditioned step), so the dylib is
compiled fresh on the mac-aarch64 agent like the .so/.dll. The stale
comment could mislead a maintainer into skipping macOS in the template,
which would hard-break the mac leg since no committed dylib remains.
Reword to state that every platform's binary is built on its own native
agent and none are committed.
@bluestreak01 bluestreak01 changed the title feat(qwp): connect timeout, ingest callbacks, and write-only mode on the QuestDB facade feat(qwp): connect timeout, ingest callbacks, and lazy_connect (tolerant startup) on the QuestDB facade Jun 30, 2026
Result handlers (onBatch/onEnd/onError) run inline on the worker's
dispatch thread: QueryWorker.runLoop -> QueryImpl.runOn ->
QwpQueryClient.execute consumes the I/O thread's events and invokes the
handler on the worker thread, then signalDone (which sets done) runs only
when that same thread loops back. So a handler that called the lease's
blocking close() or await() parked the worker thread waiting for a done
that only it could later produce -- a permanent, uninterruptible
self-deadlock that also leaked the worker and hung any other thread
awaiting the same handle. close()'s awaitUninterruptibly made it
unrecoverable, and close() is now mandatory via try-with-resources, so the
foot-gun was one stray call away.

Add a worker-thread reentrancy guard: QueryWorker.isCurrentThreadWorker()
compares Thread.currentThread() to the worker's dispatch thread, and
QueryImpl.close()/await()/await(timeout) throw IllegalStateException up
front when called on it. The exception unwinds at the user's call site
with a message pointing to cancel() (the non-blocking in-handler stop);
the worker is released normally by the app-thread close() afterwards, so
no deadlock and no leak.

Also fix the docs the bug exposed. Query.handler, Completion, and the
QueryWorker class javadoc all claimed the handler runs on "the I/O
thread"; it runs on the worker (dispatch) thread, which consumes the I/O
thread's event queue inline. The handler/close()/await() javadocs now say
so and forbid blocking calls from a handler.

Tests: QueryWorkerTest.testCloseAndAwaitFromWorkerThreadThrowInsteadOf-
Deadlocking drives close()/await()/await(timeout) on the worker thread and
asserts IllegalStateException (a method-level timeout fails the test if the
guard regresses into the old deadlock; verified it times out without the
guard). The QuestDB facade E2E suite adds an end-to-end check that a real
handler calling close()/await() throws and leaves the worker reusable.
testBrokenSenderIsNotReturnedToPool asserted assertNotSame(first,
second) on the two PooledSender wrappers. SenderPool.borrow() allocates
a fresh PooledSender on every call, so that comparison is unconditionally
true and proves nothing: it stays green whether or not the broken slot
was discarded. With the discard logic reverted the test failed only
incidentally, when second.close() re-threw on the recycled broken
delegate, masking the real intent.

Compare the underlying SenderSlot instead via the existing slotOf()
helper, mirroring testBorrowReturnRecyclesSameDecorator. The pool
recycles slots, not wrappers, so a broken slot leaking back to the next
borrower now surfaces as the same slot and fails the assertion directly.
The finally swallows the incidental close() exception so the assertion
result is what surfaces. Verified by injecting the bug (giveBack instead
of discardBroken on flush failure): the corrected assertion fails as a
clean Failure at the assertion line, and passes with correct code.

Also document the QueryWorker lost-dispatch coverage boundary in
QueryWorkerTest: the single-flight-reuse race fix has no deterministic
unit reproduction here because it needs the worker mid-runOn(client)
when the user thread re-dispatches, which requires a live query client.
That regression is guarded end-to-end by
QuestDBFacadeE2ETest.testSustainedMixedConcurrency in the parent repo;
testShutdownRacingDispatchMustNotStrandCaller covers only the adjacent
shutdown branch.
Query.close() used to drain an in-flight submit with an unbounded,
uninterruptible wait: while (!done) doneCondition.awaitUninterruptibly().
Because the lease model makes close() mandatory (try-with-resources),
ordinary code that bounded its own await(timeout) and gave up still hit
this drain and could block the caller for the full remaining query
duration when the server was slow to honor the cancel. Worse, when a
QuestDB.close() raced an in-flight lease close() and the client I/O
thread failed to join within shutdownJoinMs, QwpQueryClient.close()
skipped closePool() -- the synthetic-terminal source -- so done was
never set and the lease close() hung forever.

Make the drain bounded and interruptible, and fail safe on timeout:

- QueryImpl.close() now waits at most closeQueryTimeoutMillis via
  awaitNanos and aborts on interrupt (re-raising the flag). A caller is
  never pinned to the full query duration.
- On timeout or interrupt the worker is discarded, not returned: its
  connection may still carry late RESULT_* frames for the abandoned
  query, which would corrupt the next borrower's stream. The pool grows
  a fresh worker on the next borrow. QueryClientPool.discard() mirrors
  the ingest side's discardBroken (closed/stale-generation guarded).
- QueryWorker.shutdown() now interrupts the dispatch thread. takeEvent()
  (QwpSpscQueue.take) is interrupt-aware and executeOnce() turns the
  InterruptedException into a terminal -> signalDone, so a caller parked
  in close() is released even when the I/O thread is wedged and
  closePool() never runs. This closes the hang-forever race and makes
  QuestDB.close() more prompt.

Add the query_close_timeout_ms pool knob (default 5000ms, symmetric with
the ingest close_flush_timeout_millis) via
QuestDBBuilder.queryCloseTimeoutMillis(long), wired through QuestDBImpl
to the pool and guarded by PoolConfigHonoredTest's drift check. Update
the Query.close() Javadoc to state the bounded/interruptible/discard
contract.

QueryCloseDrainTest covers the new behavior deterministically with a
no-op connect hook: close() returns within the budget and discards a
worker that does not drain, an interrupt aborts the drain promptly, and
an already-drained worker is returned for reuse. The dispatch-thread
interrupt's effect on a genuinely stuck execute() is verified by
construction here; its end-to-end reproduction lives in the parent
repo's concurrency tests.
@bluestreak01

Copy link
Copy Markdown
Member Author

🤖 Automated code review — level-3 multi-agent pass

Review base: reviewed against the PR head 7948635 (gh pr diff + fetched refs/pull/60/head). Line numbers are PR-head. Heads-up: a local checkout of this branch can sit a few commits behind — the bounded Query.close() drain (7948635), the result-handler reentry guard (21becac), and the vacuous-assertion fix (d6bdbdf) are part of the PR head, so make sure they're in whatever finally merges.

Verdict: one blocking item (C1), plus test-hardening. Otherwise solid — Java-8-clean, no production resource leaks, no broken out-of-diff callsites, no data-path perf regressions; the M1 double-close and watchdog-TOCTOU fixes have deterministic regression tests (verified by reverting them).

🔴 Blocking

C1 — QueryWorker.runLoop can drop a dispatch on shutdown → caller's await() hangs forever
core/src/main/java/io/questdb/client/impl/QueryWorker.java:272

The outer while (!shuttingDown) exits after runOn() returns without re-inspecting current. Interleaving:

  1. worker is inside runOn(q1); the terminal callback runs inline here, signals done, wakes the app thread
  2. app: await(q1) returns → submit(q2)dispatch(q2) reads shuttingDown==false and sets current=q2
  3. another thread: QuestDB.close()queryPool.close()worker.shutdown() sets shuttingDown=true (close() shuts down borrowed/busy workers too)
  4. worker returns from runOn(q1), re-checks :272, sees shuttingDown, exits without stranding current=q2
  5. app's await(q2) blocks forever — q2 never reached the client, so no terminal event / backstop arrives

The strand logic at :279 only covers the parked worker, and testShutdownRacingDispatchMustNotStrandCaller only exercises that path (the busy path is untested). Reachable whenever one thread runs a submit→await→submit loop on a borrowed Query while another closes the handle.

Fix: change :272 to while (true) so the worker always re-enters the signalLock block; the existing if (shuttingDown) strand plus dispatch()'s own shuttingDown check then cover every ordering, and it still exits via the existing return. (Subtle interleaving — worth a second reviewer confirming.)

🟠 Moderate

  • M5 — vacuous assertion in the only flush-Error→discard test. SenderPoolErrorSafetyTest.flushErrorDiscardsBrokenSenderInsteadOfRecycling:107 asserts assertNotSame(first, second) on two pool.borrow() results, but SenderPool.borrow() always returns a fresh PooledSender (SenderPool.java:677,725) — so it can never fail, and the RED case (broken sender recycled via giveBack instead of discardBroken) passes too. d6bdbdf fixed this exact pattern in the SenderPoolTest sibling (slotOf(...), line 65) but missed this one, which is the only coverage of the flush-Error path. Fix: add a slotOf() helper and assert assertNotSame(slotOf(first), slotOf(second)).

  • M6 — two TLS handshake tests leak 768KB native; assertMemoryLeak doesn't catch it. JavaTlsClientSocketTest.testHandshakeWaitsForReadabilityInsteadOfBusySpinning and testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress call prepareInternalBuffers() (3×256KB) but leave state == STATE_EMPTY, so close() returns early without freeInternalBuffers(). The recv/send sibling tests avoid this with setIntField(socket,"state",2) before close; these two omit it. (The leak slips past assertMemoryLeak because it's confined to one MemoryTag — a separate pre-existing harness gap worth confirming.) Fix: set state before close, matching the sibling tests.

  • M1 — the lost-dispatch fix has no deterministic in-tree regression test. Reverting current=null-at-consumption in QueryWorker.runLoop leaves the in-tree suite green; the named testSustainedMixedConcurrency is parent-repo + probabilistic. A deterministic latch-based test (block the worker after signalDone but before returning from runOn, re-submit, assert under @Test(timeout=…)) would also cover C1's fix.

  • M2 — missing cross-context tests: a real wss:// connect against a TCP-complete-but-TLS-stalled peer bounded by connect_timeout (TlsProxy exists but is never instantiated); the handshake-failure disconnect() (no fd/buffer leak) is unasserted; the recovery-delegate callback-exclusion negative path is untested.

  • M3 — weak close-drain tests: QueryCloseDrainTest.testCloseReturnsWorkerWhenAlreadyDrained passes with the fix reverted (characterization, not regression); the shutdown() thread.interrupt() line is pinned by no test.

  • M4 — stale comment: SenderPoolErrorSafetyTest:47-48 still says the fake is "injected ... by reflection," but it now uses the public @TestOnly ctor (the sibling SenderPoolSfTest was updated).

🟡 Minor / nits

  • connect_timeout silently truncates to int on the ws/wss path: Sender.java:3484 and QwpQueryClient.java:393 do (int) view.getLong("connect_timeout", 0), but the schema allows up to Long.MAX_VALUE. Values > Integer.MAX_VALUE truncate to a wrong small timeout, or throw a confusing "connect_timeout must be > 0: -…" for a positive input. Cap the schema max at Integer.MAX_VALUE (sibling auth_timeout_ms already clamps via Math.min(..., Integer.MAX_VALUE)).
  • QueryImpl.close():148 runs rejectHandlerReentry("close") before the stale-generation check (:155), so a stale close from a handler thread throws instead of the contractual idempotent no-op. cancel() already orders the gen-check first; reorder for consistency.
  • Pre-existing TLS spin (cheap to harden): runHandshake's while (status != FINISHED) has no default/NOT_HANDSHAKING case — a JSSE provider that completes the handshake via a delegated task would spin without parking on the waiter. Not stock-JDK-reachable, but since this PR's goal is spin-elimination, a one-line default:-throws closes it.
  • getBool undertested: the only ConfigView bool test targets the sibling getBoolOnOff; getBool's false/off acceptance and invalid-value throw (used by lazy_connect) are uncovered.
  • net.c deadline comment claims "strict upper bound / only ever under-shoots," but the negative-ns ms-truncation can overshoot by <1ms (harmless — fix the comment).
  • Test DRY: the Proxy-based fake Sender is duplicated 3× (SenderPoolErrorSafetyTest, QuestDBImplErrorSafetyTest), awaitTrue 2–3×, and the SSLEngine stubs 2×; redundant Class.forName + reflective ctor for the public QueryWorker in QueryImplResetTest (use new QueryWorker(...)).
  • Other nits: QuestDBBuilder.java:73 config field is out of alphabetical order; 5-digit magic numbers without _ separators in the config-honored tests; SenderPoolTest:200-202 and QuestDBLazyConnectTest.testLazyConnectKeepsReadsEnabledWhileServerDown have near-vacuous assertions (count / db != null), covered more meaningfully elsewhere.

Process

The PR bundles ~6 semi-independent concerns plus a CI/native overhaul. Consider splitting the §5 lease/concurrency core and the CI/native build changes for isolated review and bisectability. The title omits §5 (the riskiest part — M1 double-close, the TOCTOU cancel, C1); surface it in the squash message since branch history is throwaway.

Generated by Claude Code (multi-agent review). Findings were verified against source; ~6 draft findings were dropped as false positives. Treat C1 as the one must-fix; the rest is polish / test-hardening.

bluestreak01 and others added 7 commits June 30, 2026 22:19
QueryWorker.runLoop checked `while (!shuttingDown)` at the top of the
loop. After runOn() returned, control hit that check without taking
signalLock or re-reading `current`, so QuestDB.close() shutting down a
borrowed, busy worker could exit the loop with a job already pending:

  1. The worker is inside runOn(q1); the terminal callback fires inline,
     signals done, and wakes the app thread.
  2. The app thread's await() returns and it submits again on the same
     reused lease. dispatch() reads shuttingDown==false, sets current=q2.
  3. QuestDB.close() -> queryPool.close() -> worker.shutdown() flips
     shuttingDown to true (close() snapshots every worker in `all`,
     borrowed ones included).
  4. The worker returns from runOn(q1), re-checks the loop top, sees
     shuttingDown==true, and exits -- without re-inspecting current.

q2 is then never run, never stranded, and never signalled, so the app
thread's next await() blocks forever. dispatch()'s own shuttingDown check
does not catch it (dispatch won the race and read false), the shutdown
interrupt never reaches it (q2 was never handed to the client), and
client.close()'s synthetic terminal cannot reach it either.

The strand logic that signals a pending current already lives inside the
signalLock block (the `if (shuttingDown) return` branch). Looping
unconditionally routes every shutdown ordering through that single strand
point, and the loop still terminates via the same return. The existing
QueryWorkerTest.testShutdownRacingDispatchMustNotStrandCaller only drove
the parked-worker variant, so this busy-worker path was untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
flushErrorDiscardsBrokenSenderInsteadOfRecycling asserted
assertNotSame(first, second) on the two PooledSender wrappers.
SenderPool.borrow() allocates a fresh PooledSender on every call, so that
comparison is unconditionally true and proves nothing: it stays green
whether the broken slot was discarded (discardBroken) or recycled
(giveBack). Reverting PooledSender.close()'s discard fix left the test
green, so the only coverage of the flush-Error discard path was dead.

Capture the underlying SenderSlot before close() and compare slots via a
new slotOf() reflective helper, mirroring the sibling fix in d6bdbdf
(SenderPoolTest.testBrokenSenderIsNotReturnedToPool). The pool recycles
slots, not wrappers, so a broken slot leaking back to the next borrower
now surfaces as the same slot and fails the assertion directly. The
existing "close() must propagate the Error" assertion is unchanged.

Verified: with the bug injected (giveBack instead of discardBroken) the
assertion now fails as expected; green against the real discardBroken.
df6f7ca (while (!shuttingDown) -> while (true) in QueryWorker.runLoop)
repaired the busy-worker path: a worker returning from runOn() with
current already re-armed by a reused lease's submit() must strand and
signal that job before exiting on shutdown. That fix shipped without a
deterministic regression test -- the only adjacent test,
testShutdownRacingDispatchMustNotStrandCaller, drives only the
parked-worker branch and stays green with df6f7ca reverted.

Reproducing the busy path needs the worker mid-runOn() when current is
re-dispatched and shuttingDown flips. QueryWorker and QueryImpl are final
and QwpQueryClient has no test seam, so the only race-free pause point is
a test-only barrier. Add a volatile busyWorkerTestHook (null in
production; the sole cost is a null check) invoked on the worker thread
right after a job returns from runOn(), and a new test that installs it
to set current=q2 and shuttingDown=true at exactly that window.

The new test fails on the reverted fix ("q2 never signalled") and passes
on the fix; the old parked test passes either way, confirming it never
guarded this path.
The committed linux-x86-64/libquestdb.so had regressed to a GLIBC_2.33
load floor (stat@GLIBC_2.33 / fstat@GLIBC_2.33), up from the intended
2.14, making it fail to load on RHEL/Rocky/Alma 8, Ubuntu 20.04, Amazon
Linux 2, Debian 10 and RHEL 7. It was also the last committed client
binary, contradicting the "no longer committed" CI comment.

- Delete the committed linux-x86-64 .so. The release and CI jobs build
  the native libs from source in low-glibc containers, and Os.java loads
  them via the prd/bin-local paths, so nothing depends on the committed
  file.
- Add .github/scripts/check-glibc-floor.sh: extracts every versioned
  GLIBC import via objdump and fails if the highest node exceeds the
  allowed floor, naming the offending symbols.
- ci.yml: new glibc-floor matrix guard that rebuilds the linux libs in
  the same manylinux_2_28 containers as release and asserts the floor
  (2.14 on x86-64, 2.17 -- the lowest glibc offers -- on aarch64). The
  build-jdk8 job builds on ubuntu-latest for functional tests only and
  cannot catch a floor regression.
- maven_central_release.yml: assert the floor in the linux smoke-tests
  so a regressed library can never be shipped.
- rebuild_native_libs.yml: assert the floor before committing so a
  regressed library can never be committed.
…ory leak

Both testHandshakeWaitsForReadabilityInsteadOfBusySpinning and
testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress called
prepareInternalBuffers() (3x256KB NATIVE_TLS_RSS) but never set state
to STATE_TLS, so try-with-resources close() returned early on
STATE_EMPTY and skipped freeInternalBuffers(), leaking ~768KB.

The leak went undetected because assertMemoryLeak tolerates a delta
confined to a single MemoryTag. Set state=2 before close, matching the
recv tests.
…spin

The NEED_TASK branch reads sslEngine.getHandshakeStatus(), which per the JSSE
contract never returns FINISHED -- it returns NOT_HANDSHAKING once the handshake
completes. If a delegated task were the terminal handshake step, the loop would
land on NOT_HANDSHAKING, match no switch case, and busy-spin at 100% CPU with no
deadline escape. Exit the loop on NOT_HANDSHAKING (a completed handshake) as well
as FINISHED, closing this latent/provider-dependent spin path.
@bluestreak01

Copy link
Copy Markdown
Member Author

Automated review — level-3 multi-agent pass

Reviewed against PR head 1570c5a (fresh-context reviewers covering correctness/concurrency, resource/perf, native C/JNI, cross-context callers, fresh-adversarial, and test efficacy/quality; every finding verified against source).

Verdict: approve with minor follow-ups. The concurrency-heavy lease/generation core is correct, the native connect-timeout + glibc pin are sound and CI-guarded, and no in-diff or out-of-diff correctness bug survived verification. The prior blocker C1 (busy-worker shutdown-drop) is fixed here (QueryWorker.runLoop is while (true) with a single strand point inside signalLock), as are the earlier M5 (vacuous flush-Error assertion) and M6 (TLS test buffer leak).

🔴 Critical

None.

🟠 Moderate

M1 — HTTP-ILP TLS-handshake timeout is not flagged as a timeout (in-diff). HttpClient.connect drives the handshake waiter via remainingTime(...) (HttpClient.java:191-197), which throws a non-flagAsTimeout() HttpClientException. Inconsistent with the TCP connect-timeout on the same path (HttpClient.java:630, flagged) and with WebSocketClient, which routes the handshake through getRemainingTimeOrThrow(...) and does flag it (WebSocketClient.java:1030-1036). Impact is cosmetic today (AbstractLineHttpSender retries on any HttpClientException regardless of isTimeout()), but the handshake bounding is new behavior, so a caller/telemetry keying on isTimeout() on the legacy ILP path would misclassify a mid-handshake stall. Fix: use getRemainingTimeOrThrow on the HTTP handshake path (or add .flagAsTimeout() in remainingTime).

M2 — Test-regression gaps.

  • The head-commit fix (1570c5a, add NOT_HANDSHAKING to the handshake-loop exit, JavaTlsClientSocket.java:521) has no regression test: the only happy-path test uses ProgressingUnwrapSslEngine returning FINISHED, so the loop exits on the FINISHED clause and deleting && … != NOT_HANDSHAKING leaves both TLS tests green. No stub returns NEED_TASK. Add an engine that finishes via NEED_TASK/NOT_HANDSHAKING.
  • The lost-dispatch fix (clear current under signalLock at consumption) has no in-repo regression — acknowledged in QueryWorkerTest javadoc; only covered end-to-end in the parent repo.
  • Missing coverage: ConfigView.getBool false/off/invalid; recovery-delegate callback exclusion (SenderPool.buildManagedSlotSender forRecovery guard); handshake-error disconnect() (no fd/TLS-buffer leak); the connect_timeout TCP path is Assume-skipped on runners without a route to TEST-NET-1, so the core regression may not run in CI.

🟡 Minor

  • connect_timeout truncation. Sender.java:3481 and QwpQueryClient.java:393 do (int) view.getLong("connect_timeout", 0) but the schema allows up to Long.MAX_VALUE. Values in (2^31, 2^32) become negative → confusing "must be > 0" on a positive input; ≥2^32 silently truncate. Clamp the schema max to Integer.MAX_VALUE (as auth_timeout_ms does).
  • ConfigView.getBool is case-sensitive (only lowercase true/false/on/off; values are stored raw), so lazy_connect=TRUE throws — inconsistent with initial_connect_retry (equalsIgnoreCase). Also lazy_connect is registered boolOnOff (on/off) but read via getBool (true/false/on/off).
  • Facade javadoc overstates zero-allocation. QuestDB.java:34-36/88-89/113-114 claim borrows are "zero at steady state / a pre-allocated handle," but each borrowQuery()/borrowSender() allocates a fresh QueryLease/PooledSender (JIT usually scalar-replaces in try-with-resources). Contradicts the accurate QueryLease/PooledSender javadoc. Only the submit/data path is truly allocation-free.
  • Weak/near-vacuous tests. SenderPoolTest.testPoolBuildsRequestedNumberOfSenders:213-215 asserts assertNotSame on wrappers borrow() always makes fresh — assert pool.totalSize()==3 (or slotOf(...)). QuestDBLazyConnectTest.testLazyConnectKeepsReadsEnabledWhileServerDown:77 only asserts assertNotNull(db) (never null) and never calls borrowQuery() despite its name.
  • QueryImpl.close(gen) ordering. Runs rejectHandlerReentry("close") before the stale-generation check (:148 vs :155); cancel() already checks gen first. Reorder for idempotency-contract consistency (very low impact).
  • Test hygiene: assertMemoryLeak omitted in native-allocating white-box tests (QueryLeaseGenerationTest, QueryWorkerTest, QueryCloseDrainTest, QueryImplResetTest, SenderLeaseGenerationTest); awaitTrue duplicated (QuestDBServerRecoveryTest, QuestDBBuilderTest) → TestUtils.
  • Concurrent double-close() of one live PooledSender double-flushes the non-thread-safe delegate (fast-path gen check + flush() before any lock). App-misuse only (handle documented not thread-safe); sequential/stale double-close is correctly idempotent. Very low.
  • CI/build: mac-x64 test agent removed from run_tests_pipeline.yamldarwin-x86-64 is still built/released (maven_central_release.yml:160-161) but loses CI test coverage. rebuild_native_libs.yml still git adds the binaries this PR otherwise removes (vestigial — running it re-commits them). A fresh checkout now needs a native build before mvn -pl core test (else UnsatisfiedLinkError) — worth a dev-setup note. net.c:330-333 comment overstates "only ever under-shoots by <1ms" (harmless).

Downgraded (false positives / already resolved — verified)

  • "No CI guard on the resulting .so glibc floor" — dismissed: the PR adds .github/scripts/check-glibc-floor.sh, wired into ci.yml:161, rebuild_native_libs.yml:114/154, maven_central_release.yml:299/366 (floor 2.14 x86-64, 2.17 aarch64) — exactly the guard for the clock_gettime symver pin.
  • TLS internal-buffer leak on handshake failure — false positive: on any handshake throw, state stays STATE_PLAINTEXT; JavaTlsClientSocket.close():141-145 explicitly frees the 3×256KB NATIVE_TLS_RSS buffers in that branch.
  • setupIoWait() double-registration on reconnect — false positive: reconnect does socket.close() first, removing the fd from epoll before the re-ADD.
  • connect_timeout==0 path change — false positive: both callers gate on >0 and otherwise call the unchanged connectAddrInfo; old behavior is byte-for-byte preserved.

Summary

  • ~14 findings verified (0 critical, 2 moderate, ~12 minor); ~7 draft findings dropped as false positives or already-fixed.
  • All verified findings are in-diff; the cross-context pass (every startTlsSession / connectAddrInfoTimeout / removed-facade-API / reconnect callsite) returned SAFE at every callsite — the large refactor has no broken external callers.
  • Top asks before merge: M1 (flag the HTTP handshake timeout) and M2 (add a NOT_HANDSHAKING regression test + close the getBool / callback-exclusion / handshake-disconnect coverage gaps). Everything else is polish.
  • Process: the title omits §5 (the riskiest part — the lease refactor and its concurrency fixes); surface it in the squash-merge message. Consider splitting the CI/native-build overhaul from the client changes for bisectability.

Generated by a level-3 multi-agent review; findings verified against source at 1570c5a.

A durable-ack (request_durable_ack=on) sender that walked an endpoint list
where every reachable node was a REPLICA synthesized a terminal
QwpDurableAckMismatchException (-> PROTOCOL_VIOLATION/HALT) in
QwpWebSocketSender.buildAndConnect. That turned a transient failover window
(a replica can be promoted; a primary will reappear) into a permanent
hard-fail, violating the store-and-forward contract whose only terminal
condition is SF exhaustion. HA senders never recovered even after a replica
was promoted:

  server rejected batch: PROTOCOL_VIOLATION - durable-ack-mismatch:
    WebSocket upgrade failed: server does not support durable ack [role=REPLICA]

Fix: when a connect round exhausts with only role-rejects, throw the
retriable QwpRoleMismatchException regardless of request_durable_ack. Both
connect paths (connectWithRetry sync, connectLoop async/reconnect) already
retry role-mismatch within reconnect_max_duration_millis, so the sender
keeps rows in SF and recovers on promotion. A genuine capability gap (an
endpoint that upgrades but does not advertise durable ack) stays terminal
via terminalUpgradeError.

BackgroundDrainer: widen its connect-retry catch to the role-reject types so
an orphaned-slot drainer keeps giving the cluster a budget to settle instead
of quarantining on the first all-replica sweep (behaviour unchanged from
when this surfaced as QwpDurableAckMismatchException).
Invariant B: once rows are in on-disk store-and-forward, the background
drainer must NEVER terminate on reconnect_max_duration_millis. A replica-only
/ all-endpoints-replica window is transient (a replica gets promoted, a
primary reappears), so the client must keep retrying -- with capped backoff --
until a primary is reachable. The ONLY producer-observable terminal condition
is SF exhaustion; the client must never fail because of the drainer.

CursorWebSocketSendLoop.connectLoop (used by both the async-initial-connect
and mid-stream reconnect paths on the I/O thread) enforced the budget as a
give-up deadline: on expiry it recordFatal'd a PROTOCOL_VIOLATION
"...-budget-exhausted" that the next producer call surfaced -> a long-but-
recoverable failover window became a permanent, producer-visible terminal.

Fix: the loop now runs `while (running)` with capped exponential backoff and
returns quietly only when the sender is closing. Genuine terminals (auth /
non-421 upgrade / durable-ack capability gap) still return immediately.
reconnect_max_duration_millis is no longer consulted here; it continues to
bound ONLY the blocking (non-lazy) initial connect in
QwpWebSocketSender.buildAndConnect (connectWithRetry), preserving fail-loud
startup semantics.

Turns green the red-first guards added in questdb-enterprise:
ReplicationTest.testQwpDurableAckDrainerNeverGivesUpPastBudget and the e2e
test_durable_ack_drainer_never_gives_up_on_reconnect_budget.
…l-replica window (Invariant B)

BackgroundDrainer.connectWithDurableAckRetry() lumps role rejects
(QwpRoleMismatchException / QwpIngressRoleRejectedException -- an
all-endpoints-replica failover window) in with a genuine durable-ack
capability gap, and gives up on BOTH the 16-attempt cap and the
reconnect_max_duration_millis wall clock -> markFailed / .failed
sentinel / DrainOutcome.FAILED.

Under Invariant B an all-replica window is TRANSIENT (a replica gets
promoted, a primary reappears): the drainer must keep retrying with
capped backoff until a primary is reachable, stopRequested, or SF
exhaustion. It must never quarantine the slot on a wall-clock budget
or attempt cap.

Add testAllReplicaWindowNeverEscalatesInvariantB: runs the drainer
against a factory that always role-rejects, observes past both give-up
triggers, and asserts it is still retrying (PENDING, no sentinel, no
persistent-failure callback, attempts past the cap). Red now
(attempts=16, outcome=FAILED); green once the drainer splits the catch
(role reject -> retry forever; capability gap -> stays terminal, keeping
testEscalatesAfterMaxAttempts* valid).
…nvariant B)

BackgroundDrainer.connectWithDurableAckRetry() lumped role rejects
(QwpRoleMismatchException / QwpIngressRoleRejectedException -- an
all-endpoints-replica failover window) together with a genuine durable-ack
capability gap and gave up on BOTH the 16-attempt cap and the
reconnect_max_duration_millis wall clock -> markFailed / .failed sentinel /
DrainOutcome.FAILED. A transient failover window thus quarantined an orphan
slot's un-drained SF data instead of waiting for a promotion.

Split the catch:
- QwpRoleMismatchException | QwpIngressRoleRejectedException (all endpoints are
  replicas right now) -> TRANSIENT. Fire the per-attempt observability callback
  and retry with capped exponential backoff indefinitely; never escalate, never
  quarantine. Only stopRequested (or a primary reappearing, or SF exhaustion)
  ends it. WARN throttled to 1/5s.
- QwpDurableAckMismatchException (a server upgraded but does not advertise
  durable ack -- a real cluster-wide capability gap) -> stays terminal with the
  bounded settle budget + attempt cap + .failed sentinel, as before.

Backoff now clamps to the remaining budget ONLY on the bounded capability-gap
path (so it still escalates promptly at the deadline); the transient failover
path backs off capped-exponential and never busy-loops past a deadline.

Turns green testAllReplicaWindowNeverEscalatesInvariantB while keeping
testEscalatesAfterMaxAttemptsAndDropsSentinel and
testWallTimeBudgetEscalatesBeforeAttemptCap valid. 48 drainer/pool tests pass.
Documentation-only cleanup after the Invariant B fix (no behavior change):

- CursorWebSocketSendLoop: correct the class + fail() javadoc that still said
  the reconnect loop is "time-capped at reconnect_max_duration_millis". It now
  retries indefinitely (capped exponential backoff); only auth/upgrade rejects
  or SF exhaustion are terminal.
- CursorWebSocketSendLoop: comment the now-dead reconnectMaxDurationMillis
  field -- retained for constructor symmetry / caller API, but no longer read
  by connectLoop. The budget still bounds the blocking (non-lazy) initial
  connect via connectWithRetry, which takes it as an explicit argument.
- BackgroundDrainer: clarify that the wall-clock budget + 16-attempt cap apply
  ONLY to a genuine durable-ack capability gap; a transient all-replica
  failover window is retried indefinitely and never quarantined.
…er (Invariant B)

BackgroundDrainer.connectWithDurableAckRetry() routes any non-role, non-durable
-ack Throwable -- including "all endpoints unreachable" (server down / network
partition) -- to an IMMEDIATE markFailed / .failed sentinel on the FIRST sweep.
That quarantines an orphan slot's un-drained SF data on a transient outage,
requiring manual operator recovery, and is asymmetric with the live sender's
background loop (CursorWebSocketSendLoop.connectLoop), which backs off and
retries a transport error indefinitely.

Invariant B: a down/unreachable cluster is transient (the server will come
back). The orphan drainer must behave exactly like the live background loop --
retry with capped backoff until a primary is reachable, stopRequested, or SF
exhaustion; only genuine terminals (auth / non-421 upgrade / durable-ack
capability gap) fail fast.

Add testTransportErrorNeverQuarantinesInvariantB: alwaysFailing factory throwing
LineSenderException("all endpoints unreachable"), observed past a short budget;
asserts still retrying (PENDING, no sentinel, attempts past the first sweep). Red
now (attempts=1, outcome=FAILED); green once transport errors are retried like
connectLoop.
…g (Invariant B)

connectWithDurableAckRetry() routed every non-role, non-durable-ack Throwable
-- including "all endpoints unreachable" (server down / network partition) --
to an IMMEDIATE markFailed / .failed sentinel on the first sweep. That
quarantined an orphan slot's un-drained SF data on a transient outage, needing
manual recovery, and was asymmetric with the live sender's background loop
(CursorWebSocketSendLoop.connectLoop, which retries transport errors forever).
It was worst exactly under lazy_connect=true (server down at startup is the
whole point), where it quarantined all orphan data on boot.

Make the drainer classify failures exactly like connectLoop:
- NEW catch (QwpAuthFailedException | WebSocketUpgradeException) -> terminal,
  quarantine immediately (genuinely non-retriable across the cluster).
- catch (Throwable) -> TRANSIENT: back off (capped exponential) and retry
  indefinitely until a primary is reachable, stopRequested, or SF exhaustion;
  never quarantine. WARN throttled to 1/5s so a long/permanent outage stays
  observable without flooding.
- Role reject (all-replica) keeps retry-forever; durable-ack capability gap
  keeps its bounded settle budget (the one deliberate rolling-upgrade tolerance).

Tests:
- testTransportErrorNeverQuarantinesInvariantB: now green (down server retried).
- testNonDurableAckExceptionMarksFailedImmediately -> testTerminalUpgrade
  MarksFailedImmediately: repurposed to a genuine upgrade terminal, since a
  transport error is now retryable. 49 drainer/pool tests pass.
… drop dead RECONNECT_BUDGET_EXHAUSTED

Under Invariant B all senders are SF senders, so a connection error is never
terminal: SYNC initial connect (connectWithRetry) still surfaces it (fail loud),
but ASYNC initial connect and mid-stream reconnect (connectLoop) retry forever
and surface only GENUINE terminals (auth / non-421 upgrade / durable-ack
capability gap) or SF exhaustion. The wall-clock budget give-up was removed from
connectLoop earlier; this removes the now-dead RECONNECT_BUDGET_EXHAUSTED
connection-event kind and fixes the 7 tests that asserted the old behavior.

Main:
- Remove SenderConnectionEvent.Kind.RECONNECT_BUDGET_EXHAUSTED (never fired now)
  and its references in DefaultSenderConnectionListener (switch case + javadoc),
  SenderConnectionListener/SenderConnectionEvent javadoc, and CursorWebSocketSendLoop
  comments.

Tests (all ASYNC or mid-stream -- verified none were SYNC):
- Group 1 (asserted a connection/budget terminal -> now wrong) converted to
  Invariant-B "retries forever, no terminal" guards:
  InitialConnectAsyncTest.testAsyncBudgetExhaustionDeliversToErrorInbox
    -> testAsyncNoServerRetriesForeverNoTerminal
  InitialConnectAsyncTest.testConnectionLostBudgetExhaustionTagsDifferently
    -> testConnectionLostRetriesForeverNoTerminal
  ReconnectTest.testReconnectGivesUpAfterCap -> testReconnectNeverGivesUpInvariantB
- Group 2 (used a terminal to exercise close/facade/ownership) re-pointed from a
  dead-port connection error to a genuine 401 auth terminal (still surfaces in
  async): CloseSafetyNetTest (x2, via TestWebSocketServer.setRejectWithStatus),
  QuestDBFacadeCallbacksTest.testFacadeErrorHandlerReceivesAsyncIngestError,
  CloseOwnershipRaceTest (factory throws QwpAuthFailedException).

67 affected tests pass. SYNC surfacing and its tests are untouched.
…ts to review-pr

Add a committed-binary gate (Critical at every level, since native libs are
built from source in CI) and a store-and-forward & pool startup invariants
checklist (drainer must never impose a reconnect budget or hard-fail on a
transient outage; lazy_connect vs default startup boundaries) to the review-pr
skill. Mirror the changes across the .claude and .pi copies.
…ot the foreground sender's

Root cause of the OrphanScanIntegrationTest failure on mac-aarch64/windows CI:
BackgroundDrainer connected via the foreground sender's ReconnectSupplier,
whose buildAndConnect gate throws "sender closed during connect" whenever
the foreground cursorSendLoop is stopped. sender.close() stops that loop
BEFORE the drainer pool's graceful-drain window, so a drainer racing close()
could never (re)connect. Pre-f9ece1a this was masked: the connect failure
immediately quarantined the slot with a .failed sentinel, which HIDES it
from OrphanScanner.scan -- the test passed spuriously while data was
stranded. Post-f9ece1a (Invariant B) the drainer correctly retries the
transport-shaped error forever, exits STOPPED un-drained, and the scan now
sees the leftover slot: expected:<0> but was:<1>.

Fixes:
- ReconnectSupplier takes an optional caller-owned abort check; drainer
  factories gate on the drainer's own stopRequested instead of the
  foreground loop, so drainers can finish (including reconnects) during
  close()'s graceful-drain window.
- requestStop() now unparks the runner thread and the retry backoff parks
  in <=50ms chunks, making the pool's 500ms stop-grace effective instead
  of sleeping through it (the "small chunks" comment is now true).
- Test hardening: SilentHandler latches wire receipt so the ghost frame is
  provably on disk before the timeout-0 close (no more trivially-empty
  drains); AckHandler acks with per-connection seq counters (no more
  cross-connection acks rescued only by clamping); the drain is awaited
  while the primary is still open, decoupling the core assertion from
  close-grace timing; removed exception-swallowing catches.

Verified: 10/10 stressed runs of the failing test (previously
deterministic failure after recompile), 218 sf/cursor + 893 qwp client
tests green.
…walk test

CI (linux-x64, build 247632) failed QwpQueryClientWalkTrackerTest.
testWalk_AllUnreachableThrowsHttpClientException with
IllegalArgumentException: "duplicate addr entry: localhost:39695".

Root cause: the test picked its two unreachable ports with two
back-to-back TestPorts.findUnusedPort() calls. That helper is
bind-close-return -- once its probe ServerSocket closes, the port goes
back to the kernel's ephemeral pool, and Linux readily hands the
just-released port to the next bind(0). Both calls returned the same
port, the config became addr=localhost:P,localhost:P, and ConfigView's
duplicate-addr validation (84da57d, pre-existing on main) rejected it
before the endpoint walk under test ever ran. Latent flake since the
test was written; unrelated to this PR's changes.

Fix: TestPorts.findUnusedPorts(n) holds all n probe sockets open
simultaneously, forcing the kernel to issue n distinct ports, and
closes them together. The walk test uses findUnusedPorts(2). This is
the only call site combining independently-picked ports into one
config (verified: other multi-call tests use one port per config).
…settle budget (Invariant B)

The drainer's 16-attempt durable-ack settle budget and its wall-clock
deadline were shared with the transient paths: every all-replica role
reject incremented the same mismatchAttempts counter, and the deadline
was anchored at connect entry, so a long failover window burned the
budget before a genuine capability gap was ever observed. In a rolling
upgrade (role-reject churn, then an old-build node promoted to primary)
the first capability-gap attempt could quarantine the slot immediately,
collapsing the documented 16-attempt budget to zero.

The budget now measures a capability-gap episode:
- dedicated capabilityGapAttempts counter, incremented only by
  QwpDurableAckMismatchException sweeps
- wall-clock deadline anchored lazily at the first capability-gap error
  of the episode, never at connect entry
- a role reject resets the episode (topology churn: the offending node
  is gone), making the javadoc's 'consecutive' literally true
- a transport error neither increments nor resets, so a flaky but
  misconfigured cluster cannot evade the cap
- role rejects keep an observability-only counter, fixing the mixed
  attempt numbers previously reported to the listener

Adds interleaved regression tests (role-reject churn before/between
capability gaps, wall-clock anchoring, transport no-reset); three of
the four fail against the previous shared-counter implementation.
@mtopolnik

Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 366 / 526 (69.58%)

file detail

path covered line new line coverage
🔵 io/questdb/client/impl/PooledSender.java 14 54 25.93%
🔵 io/questdb/client/impl/QueryLease.java 6 22 27.27%
🔵 io/questdb/client/cutlass/http/client/HttpClient.java 5 14 35.71%
🔵 io/questdb/client/network/JavaTlsClientSocket.java 17 46 36.96%
🔵 io/questdb/client/impl/QueryImpl.java 37 75 49.33%
🔵 io/questdb/client/impl/QuestDBImpl.java 3 5 60.00%
🔵 io/questdb/client/impl/ConfigView.java 5 8 62.50%
🔵 io/questdb/client/cutlass/http/client/WebSocketClient.java 12 18 66.67%
🔵 io/questdb/client/Sender.java 11 16 68.75%
🔵 io/questdb/client/impl/QueryClientPool.java 30 34 88.24%
🔵 io/questdb/client/cutlass/qwp/client/QwpQueryClient.java 12 13 92.31%
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java 62 65 95.38%
🔵 io/questdb/client/impl/SenderSlot.java 19 20 95.00%
🔵 io/questdb/client/impl/QueryWorker.java 19 20 95.00%
🔵 io/questdb/client/QuestDBBuilder.java 47 48 97.92%
🔵 io/questdb/client/impl/SenderPool.java 37 38 97.37%
🔵 io/questdb/client/network/NetworkFacadeImpl.java 1 1 100.00%
🔵 io/questdb/client/HttpClientConfiguration.java 1 1 100.00%
🔵 io/questdb/client/impl/ConfigSchema.java 3 3 100.00%
🔵 io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java 20 20 100.00%
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java 4 4 100.00%
🔵 io/questdb/client/SenderConnectionEvent.java 1 1 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants