Skip to content

feat(array): push struct validity into children#8589

Merged
robert3005 merged 1 commit into
vortex-data:developfrom
miniex:feat/push-struct-validity-into-children
Jun 25, 2026
Merged

feat(array): push struct validity into children#8589
robert3005 merged 1 commit into
vortex-data:developfrom
miniex:feat/push-struct-validity-into-children

Conversation

@miniex

@miniex miniex commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

push_validity_into_children masks each field with the struct's top-level validity, so a row null at the struct level becomes null in every field ({a: 1, b: 2}, NULL -> {a: 1, b: 2}, {a: NULL, b: NULL}), mirroring Arrow's StructArray::flatten. remove_struct_validity drops the top-level validity to non-nullable; otherwise it is kept, and a struct with no top-level nulls is returned unchanged.

Each field is masked via a mask expression (per @gatesn's note on the issue, not the eager compute::mask of #5826). Open question: should this be a StructArray method, or a standalone mask expression in the new operator world?

Closes: #3859

Benchmark

For reference (not committed), vs hand-rolling the same masking without the fast path: with no top-level nulls the fast path is ~5-7x faster (0.26us vs 1.2us at 4 fields, 0.65us vs 4.5us at 16); with nulls the two are equal (~1.7us / ~6.3us), so the method adds no overhead.

Testing

cargo nextest run -p vortex-array passes (drops/preserves validity, intersecting field-level nulls, all-invalid, no-nulls fast path); fmt --all + clippy --all-targets --all-features clean.


I'm Korean, so sorry if any wording reads a little awkward.

add `StructArray::push_validity_into_children`, which masks each field with the
struct's top-level validity so a row null at the struct level becomes null in
every field. `remove_struct_validity` chooses whether to keep the top-level
validity or drop it to non-nullable.

Closes vortex-data#3859

Signed-off-by: Han Damin <miniex@daminstudio.net>
@miniex miniex requested a review from a team June 25, 2026 01:48
@codspeed-hq

codspeed-hq Bot commented Jun 25, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 4 regressed benchmarks
✅ 1581 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation copy_nullable[65536] 1 ms 1.4 ms -24.28%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 168.8 µs 205.5 µs -17.84%
Simulation copy_non_nullable[65536] 908.3 µs 1,089.3 µs -16.62%
Simulation compact_sliced[(4096, 90)] 750 ns 837.5 ns -10.45%
Simulation chunked_bool_canonical_into[(1000, 10)] 27.1 µs 16.6 µs +63.22%
Simulation chunked_varbinview_canonical_into[(100, 100)] 259.5 µs 224.4 µs +15.66%
Simulation chunked_varbinview_into_canonical[(100, 100)] 306.6 µs 271.2 µs +13.02%
Simulation rebuild_naive 109.3 µs 98.6 µs +10.82%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing miniex:feat/push-struct-validity-into-children (2ecabe7) with develop (15cec3b)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@robert3005 robert3005 added the changelog/feature A new feature label Jun 25, 2026
@robert3005 robert3005 merged commit 9567467 into vortex-data:develop Jun 25, 2026
78 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a method to push struct validity into children

2 participants