Skip to content

Pipe: order historical TsFiles by flush time#18088

Open
Caideyipi wants to merge 1 commit into
masterfrom
pipe-history-tsfile-flush-order
Open

Pipe: order historical TsFiles by flush time#18088
Caideyipi wants to merge 1 commit into
masterfrom
pipe-history-tsfile-flush-order

Conversation

@Caideyipi

Copy link
Copy Markdown
Collaborator

Description

This PR adds a new historical pipe source option:

  • extractor.history.tsfile.order-by-flush-time
  • source.history.tsfile.order-by-flush-time
  • default: true

When enabled, historical TsFile-only extraction sends selected TsFiles in source-side file creation / flush-time order instead of progressIndex order. The goal is to preserve overwrite semantics for duplicated timestamps: older TsFiles are transferred first, and newer TsFiles are transferred later so the receiver can overwrite older values with newer values.

Semantic Changes

For historical TsFile extraction where insertions are captured and deletions are not captured:

  1. Historical working TsFile processors are closed synchronously before extraction, so the source has a stable set of TsFiles to order.
  2. Selected TsFiles are sorted by:
    • sequence files before unsequence files,
    • file-name timestamp / flush time ascending,
    • file version ascending,
    • compaction version ascending,
    • file path as a deterministic tie-breaker.
  3. TsFiles filtered out by progress, time, path, delete status, pipe-generated status, or pin failure are removed before computing delayed progress. Their progressIndex is not reported by this historical flush-time path.
  4. Because flush-time order is not guaranteed to be compatible with progressIndex topological order, per-TsFile commit progress reporting is disabled for reordered historical TsFile events.
  5. Tablet events generated from those TsFile events inherit the no-report behavior, so decomposing a TsFile into tablets does not accidentally advance progress early.
  6. After all selected reordered historical TsFiles are supplied, the source emits one ProgressReportEvent with the max progressIndex of the selected resources.

When historical deletions are captured together with insertions, the source keeps the previous progressIndex ordering. Deletion resources only carry progressIndex ordering information, so this avoids changing insertion/deletion ordering semantics.

The option can be set to false to keep the previous progressIndex-based ordering.

Tests / Coverage

Added unit coverage for:

  • the new option defaulting to true,
  • sorting older flush-time TsFiles before newer flush-time TsFiles,
  • explicitly disabling the new option and falling back to progressIndex order,
  • delaying progress reporting until all reordered historical resources are consumed,
  • excluding filtered-out TsFiles from the delayed max progressIndex,
  • preserving the no-progress-report flag when a TsFile event is shallow-copied,
  • making generated tablet events inherit the source TsFile event's no-progress-report behavior.

I did not add an integration test because ordinary end-to-end write/flush scenarios usually produce progressIndex order that matches flush-time order, so such an IT would not deterministically cover the regression. The deterministic behavior that matters here is the historical source's resource ordering and progress reporting contract, which is covered by unit tests with explicit TsFileResource ordering and progressIndex setup.

Local verification:

  • mvn spotless:apply -pl iotdb-core/datanode
  • mvn spotless:apply -pl iotdb-core/node-commons
  • mvn "-Ddevelocity.off=true" -o install -pl iotdb-core/node-commons "-DskipTests=true" "-Dcheckstyle.skip=true" "-Dspotless.check.skip=true"
  • git diff --check origin/master..HEAD

Attempted datanode targeted UT execution locally, but the module currently fails during main compilation in this checkout before reaching the test phase due to unrelated generated-source / thrift cache errors such as missing TShowRepairDataPartitionTableProgressResp, IFill, and Accumulator classes.

@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.24752% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.66%. Comparing base (220e7a3) to head (3a32f1e).

Files with missing lines Patch % Lines
...peHistoricalDataRegionTsFileAndDeletionSource.java 70.51% 23 Missing ⚠️
.../event/common/tsfile/PipeTsFileInsertionEvent.java 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18088   +/-   ##
=========================================
  Coverage     41.65%   41.66%           
  Complexity      318      318           
=========================================
  Files          5296     5296           
  Lines        371663   371744   +81     
  Branches      48088    48103   +15     
=========================================
+ Hits         154819   154889   +70     
- Misses       216844   216855   +11     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant