Parquet: Variant shredding follow-ups from PR #14297 by nssalian · Pull Request #16818 · apache/iceberg

nssalian · 2026-06-14T19:34:23Z

Changes

Follow-ups from the PR #14297 summary thread and some additional cleanup

Added javadoc to TIE_BREAK_PRIORITY so readers know what the constant is for. Updated the class-level javadoc to remove the stale TreeMap reference and link to TIE_BREAK_PRIORITY.
Moved PathNode.objectChildren from TreeMap to HashMap. Alphabetical schema field order is preserved by sorting once in createObjectTypedValue at schema build time.
Added debug logging in ParquetFormatModel.buildShreddedAppender that records the buffer size at construction and, per inference flush, the buffered row count and the inferred shredded field count.
Reused the existing isDecimalType helper in observe and dropped the now-unused VariantPrimitive import.
Cached the result of getMostCommonType per FieldInfo. The schema build calls it once at the root and again per node, currently rebuilding the same family-aggregation every time.
Used a min-heap of size MAX_SHREDDED_FIELDS for the field cap, replacing the full sort plus n-sized intermediate ArrayList plus HashSet allocation.
Used an int[] for FieldInfo.typeCounts keyed on PhysicalType.ordinal(), replacing Map<PhysicalType, Integer>. Removed the lambda capture allocated by Map.compute on every observation and the boxing of Integer.

Test plan

testIntermediateFieldCapLimitsTrackedFields extended with three assertions verifying the min-heap retains the alphabetically-earliest fields when counts are tied.
New testUuidFieldIsTrackedAndShredded exercises int[] indexing for a high-ordinal PhysicalType value.
Build passed locally and TestVariantShreddingAnalyzer as well.

huaxingao

LGTM

…lowup

huaxingao · 2026-06-16T21:01:00Z

Thanks @nssalian for the PR!

Parquet: Variant shredding follow-ups from PR apache#14297

5c187c9

github-actions Bot added the parquet label Jun 14, 2026

inference cleanup

d1709ca

nssalian marked this pull request as ready for review June 16, 2026 00:29

nssalian requested review from huaxingao and pvary June 16, 2026 00:29

huaxingao reviewed Jun 16, 2026

View reviewed changes

Comment thread parquet/src/main/java/org/apache/iceberg/parquet/VariantShreddingAnalyzer.java

huaxingao approved these changes Jun 16, 2026

View reviewed changes

nssalian added 2 commits June 15, 2026 22:44

PR comment, fix cache invalidation

3e01d17

Merge remote-tracking branch 'apache/main' into variant-shredding-fol…

2e4190f

…lowup

nssalian requested a review from huaxingao June 16, 2026 05:54

huaxingao merged commit 85ffa19 into apache:main Jun 16, 2026
53 checks passed

nssalian deleted the variant-shredding-followup branch June 16, 2026 21:09

nssalian added this to the Iceberg 1.12.0 milestone Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parquet: Variant shredding follow-ups from PR #14297#16818

Parquet: Variant shredding follow-ups from PR #14297#16818
huaxingao merged 4 commits into
apache:mainfrom
nssalian:variant-shredding-followup

nssalian commented Jun 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

huaxingao left a comment

Uh oh!

Uh oh!

huaxingao commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nssalian commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Test plan

Uh oh!

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huaxingao commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nssalian commented Jun 14, 2026 •

edited

Loading