Skip to content

feat: add ReplacePartitions core class (PR1 of 2 for #637)#776

Open
shangxinli wants to merge 2 commits into
apache:mainfrom
shangxinli:feat/replace-partitions-core
Open

feat: add ReplacePartitions core class (PR1 of 2 for #637)#776
shangxinli wants to merge 2 commits into
apache:mainfrom
shangxinli:feat/replace-partitions-core

Conversation

@shangxinli

Copy link
Copy Markdown
Contributor

Summary

Adds the iceberg::ReplacePartitions class — the dynamic partition overwrite operation. Each AddFile() registers the file's partition for replacement of all existing data files in that partition. Produces an overwrite snapshot with "replace-partitions"="true" in the summary. Unpartitioned tables replace all existing data files.

This is PR1 of 2 for #637. See tracking issue #775 for the split rationale.

What's in PR1

  • iceberg::ReplacePartitions class extending SnapshotUpdate:
    • Builder API: AddFile, ValidateAppendOnly, ValidateFromSnapshot, ValidateNoConflictingData, ValidateNoConflictingDeletes
    • Apply / Summary / CleanUncommitted overrides
    • Uses ManifestFilterManager for partition-based manifest filtering
  • SnapshotSummaryFields::kReplacePartitions = "replace-partitions" constant
  • ReplacePartitions forward declaration in type_fwd.h
  • CMake registration

What's deferred to PR2

  • Transaction::NewReplacePartitions() API wiring on Table / Transaction
  • End-to-end test file (replace_partitions_test.cc)
  • Conflict-detection in Validate(): the ValidateNoConflictingData / ValidateNoConflictingDeletes builder methods currently set flags but the validation body is a no-op TODO. MergingSnapshotUpdate::ValidateAddedDataFiles / ValidateNoNewDeleteFiles are protected and require either exposing them or restructuring to extend MergingSnapshotUpdate. Left to PR2 to keep this PR focused.

Design notes

  • Follows the Java ReplacePartitions semantics: operation type is overwrite, summary carries "replace-partitions"="true".
  • Mirrors the existing C++ FastAppend / MergeAppend patterns (factory Make(), SnapshotUpdate base, builder error collector).
  • Unpartitioned table case handled in Apply() via DeleteByRowFilter(AlwaysTrue()).

Test plan

  • Clean build (cmake --build build) — 293/293 targets
  • clang-format clean
  • End-to-end behavioral tests land in PR2 alongside the Table/Transaction API wiring

Closes part of #637 (PR2 will close it).
Tracking: #775

@shangxinli shangxinli force-pushed the feat/replace-partitions-core branch 4 times, most recently from ae973f6 to e33f433 Compare June 24, 2026 01:01
Adds iceberg::ReplacePartitions — the dynamic partition overwrite
operation. Each AddFile() registers the file's partition for
replacement of all existing data and delete files in that partition.
Produces an overwrite snapshot with "replace-partitions=true" in the
summary. Unpartitioned tables replace all existing data files.

Extends MergingSnapshotUpdate (matching Java's BaseReplacePartitions)
so the full data+delete manifest pipeline, custom-summary-property
handling, and conflict-validation helpers are inherited. AddFile()
unconditionally calls DropPartition(spec_id, file->partition) — for
unpartitioned specs the partition value is empty and the filter
manager matches every file in that spec, so no separate AlwaysTrue
path is needed. Touched partitions are tracked in a PartitionSet;
Validate() uses the partition-scoped overloads of
ValidateAddedDataFiles / ValidateNoNewDeleteFiles, or skips entirely
when no partitions were staged.

Changes:
  * iceberg::ReplacePartitions extending MergingSnapshotUpdate with
    builder API (AddFile, ValidateAppendOnly, ValidateFromSnapshot,
    ValidateNoConflictingData, ValidateNoConflictingDeletes) and
    Validate() override.
  * SnapshotSummaryFields::kReplacePartitions = "replace-partitions".
  * MergingSnapshotUpdate::SetSummaryProperty promoted from private to
    protected so subclasses can stash custom summary entries that
    survive commit retry via the cached-rebuild path.
  * Forward declaration in type_fwd.h.
  * CMake + Meson source registration.

Public API wiring (Table::NewReplacePartitions(),
Transaction::NewReplacePartitions()) and end-to-end tests are
deferred to PR2.

Tracking: apache#775
Related: apache#637
@shangxinli shangxinli force-pushed the feat/replace-partitions-core branch from e33f433 to 12dc230 Compare June 24, 2026 01:13
current_metadata, starting_snapshot_id_, replaced_partitions_, snapshot, io));
}
if (validate_conflicting_deletes_) {
ICEBERG_RETURN_UNEXPECTED(ValidateNoNewDeleteFiles(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java BaseReplacePartitions.validate also calls validateDeletedDataFiles, so concurrent overwrite/delete commits in the replaced partition are rejected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// the partition values are empty and naturally match every file under that
// spec — no separate AlwaysTrue path is needed, and validation stays scoped
// to the spec rather than the whole table.
ICEBERG_BUILDER_RETURN_IF_ERROR(DropPartition(spec_id, file->partition));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpartitioned replace is spec-scoped, but Java treats it as table-wide.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

…aFiles

- AddFile() now uses DeleteByRowFilter(AlwaysTrue()) for unpartitioned
  specs instead of DropPartition with empty partition values, matching
  Java BaseReplacePartitions which treats unpartitioned tables as a
  table-wide replace.
- Validate() now also calls ValidateDeletedDataFiles when
  ValidateNoConflictingDeletes is enabled, mirroring Java where
  validateNewDeletes gates both checks. This rejects concurrent
  overwrite/delete commits in the replaced partitions.
- New replace_by_row_filter_ flag drives the AlwaysTrue path in
  Validate() for the unpartitioned case.
// No-op update: no partitions were staged and no table-wide replace was
// requested, so there is nothing to conflict with. Calling the validators
// with AlwaysTrue here would turn an empty builder into a full-table check.
if (!replace_by_row_filter_ && replaced_partitions_.empty()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java rejects no-data-files replace. Suggest to call DataSpec() like Java.

@manuzhang manuzhang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks replace_partitions.h is missing from src/iceberg/update/meson.build.

@zhjwpku

zhjwpku commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

In the PR description, should "the iceberg::ReplacePartitions class extending SnapshotUpdate" be changed to "the iceberg::ReplacePartitions class extending MergingSnapshotUpdate"?


namespace iceberg {

/// \brief Replaces partitions in a table with new data files.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the {@link OverwriteFiles overwrite API} to explicitly overwrite data. like java?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants