Skip to content

[python] Support tag time_retained (TTL) on FileSystemCatalog#8319

Open
TheR1sing3un wants to merge 4 commits into
apache:masterfrom
TheR1sing3un:feat/python-tag-time-retained
Open

[python] Support tag time_retained (TTL) on FileSystemCatalog#8319
TheR1sing3un wants to merge 4 commits into
apache:masterfrom
TheR1sing3un:feat/python-tag-time-retained

Conversation

@TheR1sing3un

Copy link
Copy Markdown
Member

Purpose

FileSystemCatalog.create_tag rejected time_retained with NotImplementedError, and the Python Tag only inherited Snapshot fields, so get_tag and the $tags system table could only return None for create-time / TTL.

This implements real tag time_retained support on the FileSystem path, persisting tagCreateTime / tagTimeRetained in the same on-disk JSON shape as Java (org.apache.paimon.tag.Tag) so tag files round-trip across the Java and Python SDKs.

Changes

  • Tag now carries tag_create_time (LocalDateTime as a [y, mo, d, h, mi, s, ns] array) and tag_time_retained (Duration as decimal seconds), via a new per-field JSON codec.
  • create_tag / replace_tag thread time_retained through TagManager / FileStoreTable / FileSystemCatalog. With no retention, the plain Snapshot JSON is written (backward compatible), mirroring Java TagManager.createOrReplaceTag.
  • get_tag and the $tags system table surface real create_time / time_retained (matching Java Timestamp.fromLocalDateTime / Duration.toString()).
  • Tag expiration (TTL-based deletion) is intentionally out of scope.

Tests

Unit tests for the temporal codecs and Tag serde (Java golden-value shape, round-trip, reading Java-written tags, legacy plain-snapshot backward compatibility) plus FileSystemCatalog / $tags end-to-end coverage for create/replace with time_retained.

Does this PR introduce a user-facing change?

No.


Generative AI disclosure: drafted with AI assistance and reviewed by the author.

Add an optional encoder/decoder per dataclass field in the JSON
serializer (json_field_with_codec), applied only when present so existing
dataclasses are unaffected. Add time_utils codecs that mirror Jackson's
on-disk shapes for java.time types: LocalDateTime as a
[y, mo, d, h, mi, s, ns] array, Duration as decimal seconds, plus
duration_to_iso8601 and local_datetime_to_millis helpers.
Turn Tag into a dataclass extending Snapshot with optional
tag_create_time and tag_time_retained, serialized in the same on-disk
JSON shape as Java org.apache.paimon.tag.Tag so tag files round-trip
across the Java and Python SDKs. Add the from_snapshot_and_tag_ttl
factory mirroring Java's Tag.fromSnapshotAndTagTtl.
Thread time_retained through TagManager / FileStoreTable /
FileSystemCatalog so create_tag and replace_tag persist a create-time and
TTL. Mirror Java TagManager.createOrReplaceTag: with no retention the
plain Snapshot JSON is written (no tag-specific fields) to stay readable
by older readers; with a retention the richer Tag JSON is written.

Drop the NotImplementedError that previously rejected time_retained and
surface the values via FileSystemCatalog.get_tag and the $tags system
table instead of None.
Add unit tests for the temporal codecs and Tag serde (Java golden-value
on-disk shape, round-trip, reading Java-written tags, and legacy
plain-snapshot backward compatibility). Update the FileSystemCatalog and
$tags end-to-end tests to cover create/replace with time_retained,
Java-compatible on-disk JSON, and the no-TTL plain-snapshot path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant