Fix region-group cleanup: recover-safe submission + retry that actually re-runs the delete by CRZbulabula · Pull Request #18097 · apache/iotdb

CRZbulabula · 2026-07-02T12:27:58Z

Summary

Follow-up to #18033. The submit-and-return region-group cleanup path had two recovery/correctness defects (both surfaced by adversarial review of the merged change):

1. `!isStateDeserialized()` guard could permanently skip cleanup submission → orphaned region data

DeleteDatabaseProcedure (state DELETE_DATABASE_SCHEMA) and CreateRegionGroupsProcedure (state SHUNT_REGION_REPLICAS) submitted the cleanup RemoveRegionGroupProcedures only when !isStateDeserialized().

The executor persists a procedure at a state before that state's body runs (it advances the state on the previous cycle, then can stop at the inter-state boundary on a leader switch — see ProcedureExecutor#executeProcedure). So a leader switch / restart that recovers onto that state means the submission has not happened yet — but isStateDeserialized() is true, so the guard skips it forever. DeleteDatabaseProcedure then advances and drops the partition table, leaving every region's consensus peer and TsFile data orphaned on the DataNodes with no record of where they live.

Fix: remove the guard; always submit. Re-submitting on recovery is safe — each RemoveRegionGroupProcedure gets a fresh procId and performs an idempotent delete, so a duplicate is harmless, whereas a skip leaks. (CreateRegionGroupsProcedure recomputes the cleanups from the serialized failedRegionReplicaSets; DeleteDatabaseProcedure re-queries getAllReplicaSets before the table is dropped — both deterministic on replay.)

2. "Retry until deleted" was an infinite no-op after any DataNode-side FAIL

RemoveRegionGroupProcedure used getProcId() as the DataNode task id. The DataNode dedups region-maintain tasks by task id and caches the terminal result forever (RegionMigrateService.taskResultMap, cleared only on DataNode restart). So after any DataNode-side delete FAIL, every same-id retry returned the cached FAIL without ever re-running the delete — an infinite 5s loop, region data already orphaned (partition table dropped by the caller), a consensus write every cycle.

Fix: each genuine re-attempt uses a fresh DataNode task id so the DataNode actually re-executes the (idempotent) delete. The id is derived deterministically from the (consensus-replicated) procId and a persisted monotonic sequence, packed into the negative i64 space:

Negative ids are disjoint from all procIds (always >= 0), which AddRegionPeer / RemoveRegionPeer use as their task id against the same DataNode task map — so no cross-procedure collision / silent dedup.
It depends only on replicated/persisted state, so it is stable and never regresses across a leader change (unlike minting from the proc-id allocator, whose in-memory high-water mark is not replicated for a body-local value).
The in-flight id is persisted so a leader change re-polls the same task instead of double-submitting; a terminal failure clears the "submitted" flag so the next attempt mints a fresh id.

RemoveRegionGroupProcedure now retries a replica indefinitely (a group delete must not leave data on disk) instead of failing after a fixed retry budget.

Tests

RemoveRegionGroupProcedureTest#serDeTest — round-trips the new persisted deleteTaskSeq / deleteTaskSubmitted cursor.
RemoveRegionGroupProcedureTest#deleteTaskIdIsNegativeAndUnique — asserts the derived task id is always negative and injective over (procId, seq) across the full bit budget.
confignode spotless:check + compile clean.

…ly re-runs the delete Follow-up to #18033. The submit-and-return region-group cleanup had two recovery/correctness defects, both found by adversarial review: 1. The `!isStateDeserialized()` guard around submitting the cleanup RemoveRegionGroupProcedures (in DeleteDatabaseProcedure and CreateRegionGroupsProcedure) could permanently skip the submission. The executor persists a procedure AT a state before that state's body runs, so a leader switch/restart landing on that state means the submission has not happened yet -- the guard then skips it forever while the next state drops the partition table, orphaning every region's peers/data on disk with no record. Removed the guard: always submit. Re-submitting on recovery is safe (each RemoveRegionGroupProcedure gets a fresh procId and does an idempotent delete), whereas skipping leaks. 2. RemoveRegionGroupProcedure's "retry until deleted" was a no-op after any DataNode-side FAIL. The DataNode dedups delete tasks by taskId (which was getProcId()) and caches the terminal result forever, so every same-taskId retry returned the cached FAIL without ever re-running the delete -- an infinite silent loop with the region data already orphaned. Each genuine re-attempt now uses a fresh DataNode-side taskId so the DataNode actually re-executes the idempotent delete. The taskId is derived deterministically from the (replicated) procId and a persisted monotonic sequence, packed into the negative i64 space: negative taskIds are disjoint from all procIds (>= 0), which add/remove-peer tasks use as their taskId against the same DataNode task map, so there is no collision; and because it depends only on replicated/persisted state it is stable and never regresses across a leader change. The in-flight taskId is persisted so a leader change re-polls the same task instead of double-submitting. RemoveRegionGroupProcedure now retries a replica indefinitely (a group delete must not leave data on disk) instead of failing after a fixed budget.

sonarqubecloud · 2026-07-02T12:35:23Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2026-07-02T13:15:56Z

Codecov Report

❌ Patch coverage is 28.57143% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.65%. Comparing base (220e7a3) to head (3f2fe14).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...cedure/impl/region/RemoveRegionGroupProcedure.java	36.36%	21 Missing ⚠️
...edure/impl/region/CreateRegionGroupsProcedure.java	0.00%	5 Missing ⚠️
...procedure/impl/schema/DeleteDatabaseProcedure.java	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #18097   +/-   ##
=========================================
  Coverage     41.65%   41.65%           
  Complexity      318      318           
=========================================
  Files          5296     5296           
  Lines        371663   371685   +22     
  Branches      48088    48090    +2     
=========================================
+ Hits         154819   154836   +17     
- Misses       216844   216849    +5

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CRZbulabula merged commit 0a3e212 into master Jul 2, 2026
46 of 47 checks passed

CRZbulabula deleted the fix-remove-region-group-recovery branch July 2, 2026 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix region-group cleanup: recover-safe submission + retry that actually re-runs the delete#18097

Fix region-group cleanup: recover-safe submission + retry that actually re-runs the delete#18097
CRZbulabula merged 1 commit into
masterfrom
fix-remove-region-group-recovery

CRZbulabula commented Jul 2, 2026

Uh oh!

sonarqubecloud Bot commented Jul 2, 2026

Uh oh!

codecov Bot commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CRZbulabula commented Jul 2, 2026

Summary

1. !isStateDeserialized() guard could permanently skip cleanup submission → orphaned region data

2. "Retry until deleted" was an infinite no-op after any DataNode-side FAIL

Tests

Uh oh!

sonarqubecloud Bot commented Jul 2, 2026

Quality Gate passed

Uh oh!

codecov Bot commented Jul 2, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `!isStateDeserialized()` guard could permanently skip cleanup submission → orphaned region data