feat: add ReplacePartitions core class (PR1 of 2 for #637)#776
feat: add ReplacePartitions core class (PR1 of 2 for #637)#776shangxinli wants to merge 2 commits into
Conversation
ae973f6 to
e33f433
Compare
Adds iceberg::ReplacePartitions — the dynamic partition overwrite
operation. Each AddFile() registers the file's partition for
replacement of all existing data and delete files in that partition.
Produces an overwrite snapshot with "replace-partitions=true" in the
summary. Unpartitioned tables replace all existing data files.
Extends MergingSnapshotUpdate (matching Java's BaseReplacePartitions)
so the full data+delete manifest pipeline, custom-summary-property
handling, and conflict-validation helpers are inherited. AddFile()
unconditionally calls DropPartition(spec_id, file->partition) — for
unpartitioned specs the partition value is empty and the filter
manager matches every file in that spec, so no separate AlwaysTrue
path is needed. Touched partitions are tracked in a PartitionSet;
Validate() uses the partition-scoped overloads of
ValidateAddedDataFiles / ValidateNoNewDeleteFiles, or skips entirely
when no partitions were staged.
Changes:
* iceberg::ReplacePartitions extending MergingSnapshotUpdate with
builder API (AddFile, ValidateAppendOnly, ValidateFromSnapshot,
ValidateNoConflictingData, ValidateNoConflictingDeletes) and
Validate() override.
* SnapshotSummaryFields::kReplacePartitions = "replace-partitions".
* MergingSnapshotUpdate::SetSummaryProperty promoted from private to
protected so subclasses can stash custom summary entries that
survive commit retry via the cached-rebuild path.
* Forward declaration in type_fwd.h.
* CMake + Meson source registration.
Public API wiring (Table::NewReplacePartitions(),
Transaction::NewReplacePartitions()) and end-to-end tests are
deferred to PR2.
Tracking: apache#775
Related: apache#637
e33f433 to
12dc230
Compare
| current_metadata, starting_snapshot_id_, replaced_partitions_, snapshot, io)); | ||
| } | ||
| if (validate_conflicting_deletes_) { | ||
| ICEBERG_RETURN_UNEXPECTED(ValidateNoNewDeleteFiles( |
There was a problem hiding this comment.
Java BaseReplacePartitions.validate also calls validateDeletedDataFiles, so concurrent overwrite/delete commits in the replaced partition are rejected
| // the partition values are empty and naturally match every file under that | ||
| // spec — no separate AlwaysTrue path is needed, and validation stays scoped | ||
| // to the spec rather than the whole table. | ||
| ICEBERG_BUILDER_RETURN_IF_ERROR(DropPartition(spec_id, file->partition)); |
There was a problem hiding this comment.
unpartitioned replace is spec-scoped, but Java treats it as table-wide.
…aFiles - AddFile() now uses DeleteByRowFilter(AlwaysTrue()) for unpartitioned specs instead of DropPartition with empty partition values, matching Java BaseReplacePartitions which treats unpartitioned tables as a table-wide replace. - Validate() now also calls ValidateDeletedDataFiles when ValidateNoConflictingDeletes is enabled, mirroring Java where validateNewDeletes gates both checks. This rejects concurrent overwrite/delete commits in the replaced partitions. - New replace_by_row_filter_ flag drives the AlwaysTrue path in Validate() for the unpartitioned case.
| // No-op update: no partitions were staged and no table-wide replace was | ||
| // requested, so there is nothing to conflict with. Calling the validators | ||
| // with AlwaysTrue here would turn an empty builder into a full-table check. | ||
| if (!replace_by_row_filter_ && replaced_partitions_.empty()) { |
There was a problem hiding this comment.
Java rejects no-data-files replace. Suggest to call DataSpec() like Java.
manuzhang
left a comment
There was a problem hiding this comment.
It looks replace_partitions.h is missing from src/iceberg/update/meson.build.
|
In the PR description, should "the iceberg::ReplacePartitions class extending SnapshotUpdate" be changed to "the iceberg::ReplacePartitions class extending MergingSnapshotUpdate"? |
|
|
||
| namespace iceberg { | ||
|
|
||
| /// \brief Replaces partitions in a table with new data files. |
There was a problem hiding this comment.
Do we need to mention This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the {@link OverwriteFiles overwrite API} to explicitly overwrite data. like java?
Summary
Adds the
iceberg::ReplacePartitionsclass — the dynamic partition overwrite operation. EachAddFile()registers the file's partition for replacement of all existing data files in that partition. Produces anoverwritesnapshot with"replace-partitions"="true"in the summary. Unpartitioned tables replace all existing data files.This is PR1 of 2 for #637. See tracking issue #775 for the split rationale.
What's in PR1
iceberg::ReplacePartitionsclass extendingSnapshotUpdate:AddFile,ValidateAppendOnly,ValidateFromSnapshot,ValidateNoConflictingData,ValidateNoConflictingDeletesApply/Summary/CleanUncommittedoverridesManifestFilterManagerfor partition-based manifest filteringSnapshotSummaryFields::kReplacePartitions = "replace-partitions"constantReplacePartitionsforward declaration intype_fwd.hWhat's deferred to PR2
Transaction::NewReplacePartitions()API wiring onTable/Transactionreplace_partitions_test.cc)Validate(): theValidateNoConflictingData/ValidateNoConflictingDeletesbuilder methods currently set flags but the validation body is a no-op TODO.MergingSnapshotUpdate::ValidateAddedDataFiles/ValidateNoNewDeleteFilesareprotectedand require either exposing them or restructuring to extendMergingSnapshotUpdate. Left to PR2 to keep this PR focused.Design notes
ReplacePartitionssemantics: operation type isoverwrite, summary carries"replace-partitions"="true".FastAppend/MergeAppendpatterns (factoryMake(),SnapshotUpdatebase, builder error collector).Apply()viaDeleteByRowFilter(AlwaysTrue()).Test plan
cmake --build build) — 293/293 targetsclang-formatcleanTable/TransactionAPI wiringCloses part of #637 (PR2 will close it).
Tracking: #775