Skip to content

Fix region-group cleanup: recover-safe submission + retry that actually re-runs the delete#18097

Merged
CRZbulabula merged 1 commit into
masterfrom
fix-remove-region-group-recovery
Jul 2, 2026
Merged

Fix region-group cleanup: recover-safe submission + retry that actually re-runs the delete#18097
CRZbulabula merged 1 commit into
masterfrom
fix-remove-region-group-recovery

Conversation

@CRZbulabula

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #18033. The submit-and-return region-group cleanup path had two recovery/correctness defects (both surfaced by adversarial review of the merged change):

1. !isStateDeserialized() guard could permanently skip cleanup submission → orphaned region data

DeleteDatabaseProcedure (state DELETE_DATABASE_SCHEMA) and CreateRegionGroupsProcedure (state SHUNT_REGION_REPLICAS) submitted the cleanup RemoveRegionGroupProcedures only when !isStateDeserialized().

The executor persists a procedure at a state before that state's body runs (it advances the state on the previous cycle, then can stop at the inter-state boundary on a leader switch — see ProcedureExecutor#executeProcedure). So a leader switch / restart that recovers onto that state means the submission has not happened yet — but isStateDeserialized() is true, so the guard skips it forever. DeleteDatabaseProcedure then advances and drops the partition table, leaving every region's consensus peer and TsFile data orphaned on the DataNodes with no record of where they live.

Fix: remove the guard; always submit. Re-submitting on recovery is safe — each RemoveRegionGroupProcedure gets a fresh procId and performs an idempotent delete, so a duplicate is harmless, whereas a skip leaks. (CreateRegionGroupsProcedure recomputes the cleanups from the serialized failedRegionReplicaSets; DeleteDatabaseProcedure re-queries getAllReplicaSets before the table is dropped — both deterministic on replay.)

2. "Retry until deleted" was an infinite no-op after any DataNode-side FAIL

RemoveRegionGroupProcedure used getProcId() as the DataNode task id. The DataNode dedups region-maintain tasks by task id and caches the terminal result forever (RegionMigrateService.taskResultMap, cleared only on DataNode restart). So after any DataNode-side delete FAIL, every same-id retry returned the cached FAIL without ever re-running the delete — an infinite 5s loop, region data already orphaned (partition table dropped by the caller), a consensus write every cycle.

Fix: each genuine re-attempt uses a fresh DataNode task id so the DataNode actually re-executes the (idempotent) delete. The id is derived deterministically from the (consensus-replicated) procId and a persisted monotonic sequence, packed into the negative i64 space:

  • Negative ids are disjoint from all procIds (always >= 0), which AddRegionPeer / RemoveRegionPeer use as their task id against the same DataNode task map — so no cross-procedure collision / silent dedup.
  • It depends only on replicated/persisted state, so it is stable and never regresses across a leader change (unlike minting from the proc-id allocator, whose in-memory high-water mark is not replicated for a body-local value).
  • The in-flight id is persisted so a leader change re-polls the same task instead of double-submitting; a terminal failure clears the "submitted" flag so the next attempt mints a fresh id.

RemoveRegionGroupProcedure now retries a replica indefinitely (a group delete must not leave data on disk) instead of failing after a fixed retry budget.

Tests

  • RemoveRegionGroupProcedureTest#serDeTest — round-trips the new persisted deleteTaskSeq / deleteTaskSubmitted cursor.
  • RemoveRegionGroupProcedureTest#deleteTaskIdIsNegativeAndUnique — asserts the derived task id is always negative and injective over (procId, seq) across the full bit budget.
  • confignode spotless:check + compile clean.

…ly re-runs the delete

Follow-up to #18033. The submit-and-return region-group cleanup had two
recovery/correctness defects, both found by adversarial review:

1. The `!isStateDeserialized()` guard around submitting the cleanup
   RemoveRegionGroupProcedures (in DeleteDatabaseProcedure and
   CreateRegionGroupsProcedure) could permanently skip the submission. The
   executor persists a procedure AT a state before that state's body runs, so a
   leader switch/restart landing on that state means the submission has not
   happened yet -- the guard then skips it forever while the next state drops
   the partition table, orphaning every region's peers/data on disk with no
   record. Removed the guard: always submit. Re-submitting on recovery is safe
   (each RemoveRegionGroupProcedure gets a fresh procId and does an idempotent
   delete), whereas skipping leaks.

2. RemoveRegionGroupProcedure's "retry until deleted" was a no-op after any
   DataNode-side FAIL. The DataNode dedups delete tasks by taskId (which was
   getProcId()) and caches the terminal result forever, so every same-taskId
   retry returned the cached FAIL without ever re-running the delete -- an
   infinite silent loop with the region data already orphaned. Each genuine
   re-attempt now uses a fresh DataNode-side taskId so the DataNode actually
   re-executes the idempotent delete. The taskId is derived deterministically
   from the (replicated) procId and a persisted monotonic sequence, packed into
   the negative i64 space: negative taskIds are disjoint from all procIds
   (>= 0), which add/remove-peer tasks use as their taskId against the same
   DataNode task map, so there is no collision; and because it depends only on
   replicated/persisted state it is stable and never regresses across a leader
   change. The in-flight taskId is persisted so a leader change re-polls the
   same task instead of double-submitting.

RemoveRegionGroupProcedure now retries a replica indefinitely (a group delete
must not leave data on disk) instead of failing after a fixed budget.
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 28.57143% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.65%. Comparing base (220e7a3) to head (3f2fe14).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...cedure/impl/region/RemoveRegionGroupProcedure.java 36.36% 21 Missing ⚠️
...edure/impl/region/CreateRegionGroupsProcedure.java 0.00% 5 Missing ⚠️
...procedure/impl/schema/DeleteDatabaseProcedure.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18097   +/-   ##
=========================================
  Coverage     41.65%   41.65%           
  Complexity      318      318           
=========================================
  Files          5296     5296           
  Lines        371663   371685   +22     
  Branches      48088    48090    +2     
=========================================
+ Hits         154819   154836   +17     
- Misses       216844   216849    +5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CRZbulabula CRZbulabula merged commit 0a3e212 into master Jul 2, 2026
46 of 47 checks passed
@CRZbulabula CRZbulabula deleted the fix-remove-region-group-recovery branch July 2, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant