feat(rest): parse storage-credentials and bind a per-table FileIO from vended credentials#719
Conversation
c1216fa to
00ef76a
Compare
968d073 to
f8c78cc
Compare
f8c78cc to
43838fd
Compare
| const std::vector<StorageCredential>& credentials) { | ||
| const StorageCredential* best = nullptr; | ||
| for (const auto& cred : credentials) { | ||
| if (cred.prefix.starts_with("s3") && |
There was a problem hiding this comment.
Prefix selection needs the actual storage path. The REST spec and Java S3FileIO choose the longest prefix that matches each path; picking the longest S3 prefix globally makes unmatched paths and other S3 prefixes use the wrong credential.
There was a problem hiding this comment.
@wgtmac Good catch. Fixed: SelectS3StorageCredential now picks the longest credential prefix that is actually a prefix of the table's storage location (passed down from metadata->location), instead of the longest s3 prefix globally — matching the REST spec / Java S3FileIO. It also treats s3/s3a/s3n as equivalent so an s3:// vended prefix still matches an s3a:// location.
| // response. The current table commit response does not define config. | ||
| ICEBERG_ASSIGN_OR_RAISE(auto table_io, TableFileIO(context, table_config)); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto table_io, | ||
| TableFileIO(context, table_config, /*storage_credentials=*/{})); |
There was a problem hiding this comment.
This rebuilds the returned table with an empty credential list after commit. CommitTableResponse does not carry storage-credentials, so a table loaded with vended credentials loses its FileIO credentials after the first commit.
There was a problem hiding this comment.
@wgtmac Fixed. MakeTableFromCommitResponse now reuses the FileIO bound at load (threaded through TableScopedCatalog) instead of rebuilding it with an empty credential list.
| for (const auto& [key, value] : table_config) { | ||
| properties[key] = value; | ||
| } | ||
| for (const auto& [key, value] : s3_cred.config) { |
There was a problem hiding this comment.
This makes the STS values static. Java S3FileIO keeps the credential set and refreshes through credentials.uri when s3.session-token-expires-at-ms is present; long scans or reused tables will fail after the vended token expires.
There was a problem hiding this comment.
@wgtmac You're right that the vended STS values are static and will expire on long scans / reused tables. Implementing refresh properly needs a credential-aware FileIO with a background refresher that re-fetches via credentials.uri (mirroring Java's VendedCredentialsProvider + S3FileIO#scheduleCredentialRefresh) — a larger change than this PR. I've left an explicit TODO and would prefer to handle it as a follow-up. Shall I open an issue to track it? Happy to scope it there.
| // Longest-prefix "s3" vended credential. Java's VendedCredentialsProvider | ||
| // resolves per path against the table location; we bind one at load time | ||
| // (fine for the common one-credential-per-table case). | ||
| if (const StorageCredential* s3_cred = SelectS3StorageCredential(storage_credentials); |
There was a problem hiding this comment.
storage-credentials can also carry GCS/ADLS credentials. This path silently ignores non-S3 credentials and falls back to the catalog FileIO; if S3-only is intentional for now, please add an explicit TODO or fail-fast note.
There was a problem hiding this comment.
@wgtmac Addressed. When the server vends only non-S3 credentials (GCS/ADLS), TableFileIO now fails fast with NotSupported listing the prefixes, rather than silently using the catalog FileIO. If S3 credentials are vended but none covers this table's location, it falls through to the catalog FileIO (which may use ambient creds). S3-only support is intentional for now — see the TODO in TableFileIO.
dffa3b3 to
634a95e
Compare
634a95e to
9d876a9
Compare
What
Parse storage-credentials from the REST LoadTableResponse and use the vended credentials to bind a per-table FileIO, so a loaded Table can read/write its (S3-compatible) storage without any manual wiring.
Why
REST catalogs hand out short-lived, scoped storage access via vended credentials (STS). Until now LoadTableResult.storage_credentials was dropped on the floor and every table fell back to the catalog's default FileIO, so a vended-credential table could not actually access storage.
How
path; otherwise keep the catalog default. This fills the TODO(gangwu) left in feat(rest): add session-aware REST catalog #750.