Skip to content

Generalize failed-IP cooldown across all LoadBalance modes and make it configurable#2221

Open
pavel-ptashyts wants to merge 1 commit into
AsyncHttpClient:mainfrom
maygemdev:cooldown-refactoring
Open

Generalize failed-IP cooldown across all LoadBalance modes and make it configurable#2221
pavel-ptashyts wants to merge 1 commit into
AsyncHttpClient:mainfrom
maygemdev:cooldown-refactoring

Conversation

@pavel-ptashyts

Copy link
Copy Markdown
Contributor

The failed-IP cooldown (when a TCP connect to an IP fails, that IP is briefly
moved to the back of the failover order, then re-probed) previously lived inside
RoundRobinAddressSelector and only ran in LoadBalance.ROUND_ROBIN mode, with
a hard-coded on/off and duration.

This PR extracts the cooldown into its own mode-independent component and applies
it to any direct connection — DEFAULT mode included — so a recently-failed
IP is deprioritized on the next new connection regardless of load-balancing mode.
It also exposes the feature through client config.

What changed

  • New FailedIpCooldownHolder (netty/channel) — holds the cooldown logic
    (reorder(host, addresses) + markFailed(host, address)) over bou
    The failed-IP cooldown (when a TCP connect to an IP fails, that IP is briefly
    moved to the back of the failover order, then re-probed) previously lived inside
    RoundRobinAddressSelector and only ran in LoadBalance.ROUND_ROBIN
    a hard-coded on/off and duration.

This PR extracts the cooldown into its own mode-independent component and applies
it to any direct connection — DEFAULT mode included — so a rece
IP is deprioritized on the next new connection regardless of load-balancing mode.
It also exposes the feature through client config.

What changed

  • New FailedIpCooldownHolder (netty/channel) — holds the cooldown logic
    (reorder(host, addresses) + markFailed(host, address)) over bounded per-host
    state. reorder only re-orders the address list (cooling IPs to the back); it
    never drops an address, so failover always has somewhere to go.
  • RoundRobinAddressSelector is reduced to pure round-robin rotation.
  • New config settings (mirroring loadBalance, wired end-to-end
    AsyncHttpClientConfig, DefaultAsyncHttpClientConfig + Builder,
    AsyncHttpClientConfigDefaults, and ahc-default.properties):
    • isFailedIpCooldownEnabled() — default true
    • getFailedIpCooldownPeriod() — default PT10S
  • NettyRequestSender applies the cooldown for direct connections in both
    modes (round-robin: on top of rotation, before pinning the IP-aware
    default: when ordering the resolved addresses for a new connection) and feeds
    TCP connect failures back via the connector's existing failure listener.

Behavior change

With the default failedIpCooldownEnabled=true, DEFAULT mode now
deprioritizes an IP that just failed to connect
when opening its next new
connection to a multi-IP host. This preserves round-robin's previous always-on
behavior and extends the same benefit to default mode. Set
failedIpCooldownEnabled=false to restore the old default-mode behavior.

Scope & limitations

  • Applies to direct connections only (no explicit Request address; no proxy,
    or proxy bypassed for the host), keyed by the target host — matchin
    round-robin eligibility. Proxied/CONNECT requests are unaffected.
  • TCP-connect failures only (TLS/handshake failures are out of sc
    cooldown only re-orders — authoritative liveness is still expected at the
    DNS/resolver level. Per-host state is bounded (cap 4096) with arbitrary eviction.

…nfigurable

Extract the failed-IP cooldown out of RoundRobinAddressSelector into a new,
mode-independent FailedIpCooldownHolder that reorders a request's resolved
addresses before a new connection so a recently-failed IP is briefly
deprioritized. It now applies to any direct connection (DEFAULT mode included),
not just ROUND_ROBIN.

- Add FailedIpCooldownHolder (reorder/markFailed over bounded per-host state)
- Reduce RoundRobinAddressSelector to pure rotation
- Add failedIpCooldownEnabled (default true) and failedIpCooldownPeriod
  (default PT10S) config settings end-to-end
- Wire the cooldown into NettyRequestSender for direct connections in any mode,
  feeding back TCP connect failures via the connector's failure listener
- Split tests: keep rotation tests, add FailedIpCooldownHolderTest and
  FailedIpCooldownConfigTest
@pavel-ptashyts pavel-ptashyts force-pushed the cooldown-refactoring branch from 7e95a29 to 4b7d44e Compare July 1, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant