feature: Add maintain command for automated runway provisioning#30
Merged
Conversation
…guard) Discovers every partitioned parent carrying a valid pgslice settings comment and extends each with add_partitions in one run, with per-table failure isolation and a read-only CDC-readiness guard (every leaf must have a replica identity usable for logical replication). Exits non-zero if any table failed or is CDC-unsafe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single --future N means very different runway for weekly vs monthly vs yearly tables (3 weeks vs 3 months vs 3 years). Replace it with per-period --future-daily / --future-weekly / --future-monthly / --future-yearly (defaults 90 / 26 / 6 / 1); discovery returns each table's period, and each table is extended by the horizon for its own period, so the fleet gets comparable forward coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
maintain now writes one JSON object per line so a log collector can extract
the keys as attributes:
- a start record (target db + per-period horizons),
- one record per table extended (msg, level, target, partition counts,
success),
- a final summary (succeeded/failed counts and table lists).
Only info and error levels are used, and the partitioning model is left out
of the logs. The command signals failure through its exit code (now honored
by BaseCommand.execute) instead of a plain-text error line, keeping stdout
pure JSONL.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the endpoint host (from the connection URL — host only, no credentials) to
every maintain log record's target alongside the database name, so a run
identifies which host and DB it extended. Distinguish a table that needed no new
partitions ("Table already up to date; no extension needed") from one that was
extended, and drop the deployment-specific rationale from the code comments.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generate one UUID per maintain run and stamp it on the start, per-table, and final records so all logs from a single invocation can be correlated. Generated with crypto.randomUUID; injectable via options for deterministic tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AdvisoryLock.withLock released the session-level lock in a finally block. When the handler aborts the transaction (e.g. a failed partition CREATE), the unlock query itself errors with "current transaction is aborted", and that finally exception replaced the handler's real error — so maintain logged an unhelpful message instead of the cause. Release now only propagates its own failure when the handler succeeded; otherwise the handler's error wins. The test harness runs with advisory locks disabled, so this only surfaced against a live database. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Give each --future-* flag a clipanion env fallback (PGSLICE_FUTURE_DAILY / _WEEKLY / _MONTHLY / _YEARLY), so a scheduled job can configure the horizons via environment — as the Terrace ScheduledProcess will — without templating the command args. Precedence is flag > env > baked default, matching how --url already falls back to PGSLICE_URL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds a fleet-wide maintain command for pgslice partition runway management. The main changes are:
Confidence Score: 5/5This looks safe to merge. No blocking issues found in the changed code. No files need attention.
What T-Rex did
Important Files Changed
Reviews (2): Last reviewed commit: "fix: Address advisory-lock leak and nest..." | Re-trigger Greptile |
- withLock takes a transaction-scoped advisory lock (pg_try_advisory_xact_lock) instead of a session lock released in a finally. It's freed when the enclosing transaction commits or rolls back, so a handler that aborts the transaction can't leak the lock into the pooled connection — and there's no unlock query left to fail on an aborted transaction and mask the handler's error (this supersedes the earlier swallow-the-error workaround). The session-based acquire() is kept for fill/synchronize, which hold a lock across batches. - unsafeReplicaIdentityPartitions walks the whole partition tree via pg_partition_tree(...) filtered to leaves, so a leaf beneath a sub-partitioned child is inspected too — the previous direct-children query could report a table CDC-ready while a nested leaf had an unusable replica identity. Adds a leak-regression test (an aborted handler no longer blocks a second session) and a nested-leaf CDC test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ryanmcilmoyl
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
maintainis the scheduler entry point: it discovers every partitioned table that carries a pgslice settings comment and extends them all in one run.--future-daily/-weekly/-monthly/-yearly(defaults 90/26/6/1 respectively). Each also reads aPGSLICE_FUTURE_*environment variable (precedence: flag > env > default) for env-based scheduled-job config.jobIdand the target host + database.AdvisoryLock.withLockno longer lets a lock-release failure in itsfinallymask the handler's real error.Discovery filtering (
--schema), grant inheritance, and the composite PK handling reuse the existingadd_partitionsmachinery —maintainsimply adds the fleet-wide discovery/iteration plus the logging and additional guard layers.Test plan
npm test(vitest against PostgreSQL 13–18): discovery + filtering, per-period extension, idempotent re-run, per-table failure isolation + non-zero exit, the replica-identity guard, the JSONL record shape (host / db / jobId, no-op vs. extended, error surfacing), and env/flag precedence.For INFRA-5546