Skip to content

[s3] Avoid ListBucket requirement in snapshot commit path#8263

Open
hhhizzz wants to merge 11 commits into
apache:masterfrom
hhhizzz:fix-s3-snapshot-access
Open

[s3] Avoid ListBucket requirement in snapshot commit path#8263
hhhizzz wants to merge 11 commits into
apache:masterfrom
hhhizzz:fix-s3-snapshot-access

Conversation

@hhhizzz

@hhhizzz hhhizzz commented Jun 17, 2026

Copy link
Copy Markdown

Purpose

Fixes #8261.

Avoid requiring ListBucket for S3-compatible object-store snapshot commits, while preserving the stale snapshot conflict handling.

The change skips the exists(snapshot-N) probe on object stores, but writes snapshot-N with no-overwrite semantics. For S3 this uses If-None-Match: *, and maps 412 PreconditionFailed to FileAlreadyExistsException, so stale commits retry instead of overwriting existing snapshot metadata.

Non-object-store behavior is unchanged.

Tests

Added regression tests for:

  • object-store snapshot commit without probing exists(snapshot-N)
  • stale object-store commit conflict does not overwrite an existing snapshot-N
  • S3 no-overwrite snapshot write uses direct PUT
  • S3 412 PreconditionFailed is treated as the file-exists conflict signal

Verified with:

mvn -nsu -pl paimon-core -am -DskipITs -Dcheckstyle.skip -Dspotless.check.skip=true -Drat.skip=true -DfailIfNoTests=false -Dtest=RenamingSnapshotCommitTest,HintFileUtilsTest test
mvn -nsu -pl paimon-filesystems/paimon-s3-impl -am -DskipITs -Dcheckstyle.skip -Dspotless.check.skip=true -Drat.skip=true -DfailIfNoTests=false -Dtest=S3FileIOTest test

Also verified against Ceph RGW with a temporary user without ListBucket: first conditional PUT succeeds, second returns 412, and the original object

@JingsongLi

Copy link
Copy Markdown
Contributor

I think this still breaks commit conflict handling on object stores. The latest snapshot is read before acquiring the commit lock, and newSnapshotPath is computed from that value. Previously, the !fileIO.exists(newSnapshotPath) check inside lock.runWithLock was the stale-snapshot guard: if another writer committed the same snapshot id while this writer was waiting for the lock, this returned false and FileStoreCommitImpl retried with the newer latest snapshot. With the new object-store branch, we skip that check and then write snapshot-N with overwrite=true, so two serialized object-store commits that both planned snapshot N can both return success, with the second one replacing the first snapshot file. That can lose a committed snapshot even when the external lock is working. We need to preserve the no-overwrite/conflict signal for an existing snapshot file without requiring ListBucket, rather than blindly overwriting snapshot metadata on object stores.

@hhhizzz

hhhizzz commented Jun 19, 2026

Copy link
Copy Markdown
Author

Thanks for catching this.The previous object-store branch could skip the stale snapshot guard and overwrite an existing snapshot-N.

I updated the PR to keep the no-overwrite conflict signal without requiring ListBucket:

  • object stores still skip exists(snapshot-N)
  • snapshot-N is written with no-overwrite semantics
  • S3 uses If-None-Match: * for that write
  • 412 PreconditionFailed is mapped to FileAlreadyExistsException, so the commit returns false and retries instead of overwriting

Added regression tests for the stale commit conflict path and S3 412 handling.

Verified with focused Maven tests, and with Ceph RGW using a temporary user without ListBucket: first conditional PUT succeeds, second returns 412, and the original object is preserved.

// On object stores, probing a missing object can require list permission. Trust the
// committed hint and let the commit retry path handle a concurrently newer snapshot.
if (fileIO.isObjectStore()) {
return snapshotId;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still falls back to listing when LATEST is absent (for example on the first commit of a new table, or after the hint file is lost). FileStoreCommitImpl calls snapshotManager.latestSnapshot() before it writes snapshot-1, so with an S3 user that has GetObject/PutObject but no ListBucket, an empty table will fail here before the new conditional PUT path is reached. The new object-store test writes LATEST first, so it does not cover this case. For object stores we need a no-list path for a missing hint as well, such as treating the missing hint as no latest snapshot and relying on the no-overwrite snapshot write to reject an unexpected existing snapshot-1.

@hhhizzz

This comment was marked as outdated.

@hhhizzz hhhizzz marked this pull request as draft June 21, 2026 16:13
@hhhizzz hhhizzz force-pushed the fix-s3-snapshot-access branch from 4ce509a to cd6b31f Compare June 22, 2026 04:43
@hhhizzz hhhizzz marked this pull request as ready for review June 22, 2026 06:37
@hhhizzz

hhhizzz commented Jun 22, 2026

Copy link
Copy Markdown
Author

Updated the branch with the current no-list first-snapshot logic and additional conflict coverage.

The current behavior is:

  • For object stores whose FileIO supports atomic create-without-overwrite, snapshot metadata files are written with a no-overwrite create path. On S3 this is implemented with conditional PUT (If-None-Match: *).
  • If LATEST is readable, we validate the hinted snapshot file and fail closed if the hint is stale enough that snapshot-(latest + 1) already exists.
  • If LATEST is missing or hidden by no ListBucket access, and EARLIEST is also absent/unreadable, we treat this as no latest snapshot. The first commit then attempts to create snapshot-1 with conditional PUT.
  • If snapshot-1 already exists, the write does not overwrite it. Same content is treated as idempotent; different/malformed/unreadable content raises a conflict signal that requires latest-snapshot recovery by listing.
  • The normal commit loop and manifest compaction path now recover from that signal by listing and retrying. replaceManifestList maps it back to its existing false conflict result.
  • If recovery listing is denied, the exception explicitly says that ListBucket is required or a valid LATEST hint must be restored.
  • Non-object-store paths and FileIOs without no-overwrite support continue to use the original atomic rename / exists-based path.

I also explicitly checked the ambiguous damaged-metadata case: if both LATEST and EARLIEST are lost/unreadable and all low snapshot files such as snapshot-1 have already expired while later snapshots remain, that state is not distinguishable from a new table without ListBucket or another durable readable hint. This PR treats that as metadata-corruption / disaster recovery: restore hints or temporarily grant ListBucket.

Validation:

  • TDD regression:
    • Verified the new direct-writer tests failed before the fix with SnapshotCommitConflictRequiresListRecoveryException.
    • Fixed compactManifest and replaceManifestList, then verified the tests passed.
  • Unit tests:
    • mvn -B -ntp -pl paimon-core -Dtest=HintFileUtilsTest,RenamingSnapshotCommitTest,FileStoreCommitTest#testSnapshotCommitConflictTriggersListRecoveryFromCommitLoop+testSnapshotCommitConflictDoesNotRecoverWhenRetryBudgetExhausted+testManifestCompactConflictTriggersListRecovery+testReplaceManifestListConflictReturnsFalse -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false test
    • Result: 25 tests, 0 failures/errors.
  • S3 FileIO tests:
    • mvn -B -ntp -pl paimon-filesystems/paimon-s3-impl -am -Dtest=S3FileIOTest -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false test
    • Result: 11 tests, 0 failures/errors.
  • Ceph/RGW smoke:
    • Created a temporary RGW user with only GetObject, PutObject, and DeleteObject under a fresh test prefix; no ListBucket.
    • Verified AWS_LIST_DENIED=true.
    • Added an admin-side precheck before the Paimon first commit: ADMIN_TABLE_KEYS_BEFORE=0, so the table prefix had no pre-existing objects or directory marker.
    • Verified no-list first commit: LATEST_BEFORE=null, FIRST_COMMIT=true, LATEST_AFTER=1.
    • Verified stale same-id conflict does not overwrite the committed snapshot: CONFLICT_REQUIRES_RECOVERY=true, SNAPSHOT_USER_AFTER_CONFLICT=first-user.
    • Verified recovery listing is denied without ListBucket: RECOVERY_LIST_DENIED=true.
    • Cleanup succeeded: RESTORE_POLICY=ok, DELETE_TEMP_USER=ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] S3 snapshot commit requires ListBucket due to missing-object status checks

2 participants