Skip to content

ci(deploy): preempt self-update timer so it can't starve the deploy lock#72

Merged
ralyodio merged 2 commits into
mainfrom
fix/deploy-lock-preempt-selfupdate
Jul 1, 2026
Merged

ci(deploy): preempt self-update timer so it can't starve the deploy lock#72
ralyodio merged 2 commits into
mainfrom
fix/deploy-lock-preempt-selfupdate

Conversation

@ralyodio

@ralyodio ralyodio commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What broke

Run 28503235083 failed at Provision / redeploy (idempotent, SKIP_BUILD=1) with:

[fail] another setup.sh run is in progress (lock held >5m)

Root cause

setup.sh serializes runs with an exclusive flock -w 300 on /var/lock/agentbbs-setup.lock. The autonomous agentbbs-update.timer runs self-update.sh, which redeploys from source (it does not pass SKIP_BUILD=1). When the timer fires close to a push, its go build on the tiny droplet holds the lock for longer than 5 minutes, so the CI deploy (which does set SKIP_BUILD=1) waits the full 300s on the lock and then dies.

Fix

The CI push is authoritative — it ships runner-built binaries and reset --hards to the exact pushed commit. So before taking the lock, stop any in-flight timer run (releasing the lock) and pause the timer so it can't re-fire mid-deploy:

systemctl stop agentbbs-update.timer agentbbs-update.service 2>/dev/null || true

setup.sh re-enables the timer at the end of its run (§7b), so no separate restart is needed. Safe because the CI script already does its own reset --hard before calling setup.sh, so a preempted timer build leaves nothing behind.

🤖 Generated with Claude Code

The self-update systemd timer redeploys from source (no SKIP_BUILD) and
can hold setup.sh's flock for >5min while compiling on a tiny droplet.
When it fires close to a CI push it starves the deploy, which waits the
full 5min on the lock and then fails with 'another setup.sh run is in
progress (lock held >5m)'.

The CI push is authoritative (ships prebuilt binaries + resets to the
exact commit), so stop any in-flight timer run to release the lock and
pause the timer before taking it. setup.sh re-enables the timer at the
end of its run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

vu1nz Security Review

0 finding(s) in PR #?

No security issues found.

TestWebSaveOverQuotaPreservesExistingFile set quota=5 but left sess.used
at the newSession baseline, which already counts the README.txt that
ensureUserPub seeds into /public (added in d19c5c4). That baseline alone
exceeds 5 bytes, so the initial 2-byte save was rejected with
'quota exceeded' before the test could exercise the over-quota replace.

Reset sess.used to 0 after setting the tiny quota, mirroring
TestQuotaEnforced, so the writer starts from a clean gauge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ralyodio ralyodio merged commit 43dbdf0 into main Jul 1, 2026
5 checks passed
@ralyodio ralyodio deleted the fix/deploy-lock-preempt-selfupdate branch July 1, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant