fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn) by ServerSideHannes · Pull Request #99 · ServerSideHannes/s3proxy-python

ServerSideHannes · 2026-06-30T18:05:34Z

What

Cap per-pod backend concurrency at the frontproxy (haproxy maxconn) to stop the concurrent-backup OOM that lives below the app's memory limiter.

Why the app limiter can't fix this

uvicorn buffers each in-flight request body off the socket before our memory limiter runs. Under a backup flood, request bodies pile up in the HTTP server's C-level buffers — the governor reads ~64MB while RSS hits 512Mi+ and the pod is OOMKilled (exit 137). That memory is invisible and ungovernable from the app layer.

Profiler confirmed it: peak RSS 957MiB but only 87MB Python-tracked — the rest is uvicorn/httptools socket buffers for ~125 concurrent bodies.

The fix

haproxy had only a global maxconn 4096 and no per-pod cap, so leastconn could still dump 100+ connections onto one pod. This adds:

backend s3proxy_pods
  server-template ... maxconn 40     # cap in-flight requests PER pod
defaults
  timeout queue 30s                  # excess waits for a slot, then redispatch/503

haproxy now bounds in-flight requests per pod and queues the excess (redispatching to a less-loaded pod) instead of overrunning a pod's uvicorn buffers. The app's existing limiter governs the admitted few. Both knobs are chart values (frontproxy.maxConnPerPod, frontproxy.timeouts.queue).

Why the LB, not uvicorn `--limit-concurrency`

One place, all pods; no flag baked into every container.
Queues instead of hard-rejecting — clients mostly see success, not 503s.
Admission control / backpressure is the load balancer's job.

Proof (local, prod config: 512Mi cap / 64MB budget / 2026.6.14 app)

	direct (reproduces prod)	via haproxy `maxconn 40`
128×16MB PUT flood	OOMKilled, exit 137	256/256 ok, peak 335MiB
harsh mixed upload+GET flood	OOM	322MiB, no OOM

Rendered haproxy.cfg validated with haproxy -c (exit 0).

Scope

This is the upload-side OOM (the dominant cause on 2026.6.14, kill windows were PUT-heavy). Stacks on:

fix: govern copy memory + fix passthrough-copy ClientResponse.read crash #97 (2026.6.13) — copy crash + copy-OOM
fix: hold GET memory reservation for the whole streaming-response lifetime #98 (2026.6.14) — streaming-GET OOM

The remaining concurrent-backup OOM is below the app's memory limiter: uvicorn buffers each in-flight request body off the socket BEFORE our limiter runs, so a backup flood piles up request bodies in the HTTP server's C-level buffers (the governor reads ~64MB while RSS hits 512Mi+ -> OOMKilled, exit 137). This memory is invisible and ungovernable from the app layer. The load balancer is the right place to bound it. haproxy had only a global maxconn (4096) and no per-pod cap, so it could dump 100+ concurrent connections onto a single pod. Add `maxconn` per backend server (default 40) plus `timeout queue`: haproxy now caps in-flight requests per pod and QUEUES the excess (redispatching to a less-loaded pod) instead of overrunning one pod's uvicorn buffers. The app's existing limiter then governs the admitted few. Verified locally at prod config (512Mi cap, 64MB budget, 2026.6.14 app): - direct 128x16MB PUT flood -> OOMKilled exit 137 (reproduces prod) - same flood via haproxy maxconn 40 -> 256/256 ok, pod peaks 335MiB, no OOM - harsh mixed upload+GET flood via haproxy -> 322MiB, no OOM haproxy queues rather than rejects, so clients mostly see success, not 503s. Validated the rendered haproxy.cfg with `haproxy -c` (exit 0).

ServerSideHannes merged commit 98235b5 into main Jun 30, 2026
4 checks passed

ServerSideHannes deleted the fix/frontproxy-maxconn branch June 30, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn)#99

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn)#99
ServerSideHannes merged 1 commit into
mainfrom
fix/frontproxy-maxconn

ServerSideHannes commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ServerSideHannes commented Jun 30, 2026

What

Why the app limiter can't fix this

The fix

Why the LB, not uvicorn --limit-concurrency

Proof (local, prod config: 512Mi cap / 64MB budget / 2026.6.14 app)

Scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why the LB, not uvicorn `--limit-concurrency`