Skip to content

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn)#99

Merged
ServerSideHannes merged 1 commit into
mainfrom
fix/frontproxy-maxconn
Jun 30, 2026
Merged

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn)#99
ServerSideHannes merged 1 commit into
mainfrom
fix/frontproxy-maxconn

Conversation

@ServerSideHannes

Copy link
Copy Markdown
Owner

What

Cap per-pod backend concurrency at the frontproxy (haproxy maxconn) to stop the concurrent-backup OOM that lives below the app's memory limiter.

Why the app limiter can't fix this

uvicorn buffers each in-flight request body off the socket before our memory limiter runs. Under a backup flood, request bodies pile up in the HTTP server's C-level buffers — the governor reads ~64MB while RSS hits 512Mi+ and the pod is OOMKilled (exit 137). That memory is invisible and ungovernable from the app layer.

Profiler confirmed it: peak RSS 957MiB but only 87MB Python-tracked — the rest is uvicorn/httptools socket buffers for ~125 concurrent bodies.

The fix

haproxy had only a global maxconn 4096 and no per-pod cap, so leastconn could still dump 100+ connections onto one pod. This adds:

backend s3proxy_pods
  server-template ... maxconn 40     # cap in-flight requests PER pod
defaults
  timeout queue 30s                  # excess waits for a slot, then redispatch/503

haproxy now bounds in-flight requests per pod and queues the excess (redispatching to a less-loaded pod) instead of overrunning a pod's uvicorn buffers. The app's existing limiter governs the admitted few. Both knobs are chart values (frontproxy.maxConnPerPod, frontproxy.timeouts.queue).

Why the LB, not uvicorn --limit-concurrency

  • One place, all pods; no flag baked into every container.
  • Queues instead of hard-rejecting — clients mostly see success, not 503s.
  • Admission control / backpressure is the load balancer's job.

Proof (local, prod config: 512Mi cap / 64MB budget / 2026.6.14 app)

direct (reproduces prod) via haproxy maxconn 40
128×16MB PUT flood OOMKilled, exit 137 256/256 ok, peak 335MiB
harsh mixed upload+GET flood OOM 322MiB, no OOM

Rendered haproxy.cfg validated with haproxy -c (exit 0).

Scope

This is the upload-side OOM (the dominant cause on 2026.6.14, kill windows were PUT-heavy). Stacks on:

The remaining concurrent-backup OOM is below the app's memory limiter: uvicorn
buffers each in-flight request body off the socket BEFORE our limiter runs, so a
backup flood piles up request bodies in the HTTP server's C-level buffers (the
governor reads ~64MB while RSS hits 512Mi+ -> OOMKilled, exit 137). This memory
is invisible and ungovernable from the app layer.

The load balancer is the right place to bound it. haproxy had only a global
maxconn (4096) and no per-pod cap, so it could dump 100+ concurrent connections
onto a single pod. Add `maxconn` per backend server (default 40) plus
`timeout queue`: haproxy now caps in-flight requests per pod and QUEUES the
excess (redispatching to a less-loaded pod) instead of overrunning one pod's
uvicorn buffers. The app's existing limiter then governs the admitted few.

Verified locally at prod config (512Mi cap, 64MB budget, 2026.6.14 app):
  - direct 128x16MB PUT flood -> OOMKilled exit 137 (reproduces prod)
  - same flood via haproxy maxconn 40 -> 256/256 ok, pod peaks 335MiB, no OOM
  - harsh mixed upload+GET flood via haproxy -> 322MiB, no OOM
haproxy queues rather than rejects, so clients mostly see success, not 503s.
Validated the rendered haproxy.cfg with `haproxy -c` (exit 0).
@ServerSideHannes ServerSideHannes merged commit 98235b5 into main Jun 30, 2026
4 checks passed
@ServerSideHannes ServerSideHannes deleted the fix/frontproxy-maxconn branch June 30, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant