Remove ArrayAccessor by robert3005 · Pull Request #8603 · vortex-data/vortex

robert3005 · 2026-06-26T13:34:19Z

ArrayAccessor was a poorly thought out api that leaves performance on the table.

We never use it in any hot path and only serves as a shortcut for converting
string arrays to a vec of strings. There are simpler methods for doing that in
tests

Signed-off-by: Robert Kruszewski <github@robertk.io>

0ax1 · 2026-06-26T13:35:27Z

@claude review in depth

github-actions · 2026-06-26T13:35:49Z

Claude finished @0ax1's task in 4m 8s —— View job

Review: Remove `ArrayAccessor`

Gather context (deleted APIs, callers, replacement patterns)
Analyze correctness of replacements
Check tests & performance implications
Post review

The removal is clean. ArrayAccessor and its three impls (primitive/varbin/varbinview) are deleted, the accessor module is unwired everywhere, and there are no dangling references to the removed APIs (onpair_compress_array, onpair_compress_iter, crate::accessor, etc.). The bulk of the diff is mechanical test refactoring from with_iterator to validity().execute_mask(...) + indexed access. Threading ctx: &mut ExecutionCtx through FoR::encode, PrimitiveArray::map_each_with_validity, PrimitiveData::from_values_byte_buffer, and onpair_compress is consistent and all call sites are updated (btrblocks, compat-gen, pco, zstd, cuda).

Correctness spot-checks pass: Buffer<u8> Ord compares raw bytes lexicographically, so the min_max migration to ByteBuffer::minmax() preserves ordering; Option<ByteBuffer> ordering (None < Some) matches the old Option<&[u8]> semantics for is_sorted.

A few things worth addressing:

1. Performance regression in `min_max` / `is_sorted` over strings (medium)

vortex-array/src/aggregate_fn/fns/min_max/varbin.rs and .../is_sorted/varbin.rs replace the borrowed-slice iterator with array.bytes_at(i):

// min_max/varbin.rs
let minmax = (0..array.len())
    .filter(|&i| mask.value(i))
    .map(|i| array.bytes_at(i))   // -> owned ByteBuffer per element
    .minmax();

VarBinViewArray::bytes_at (array.rs:427) constructs an owned ByteBuffer per call — for inlined views it does views_handle().as_host().clone().into_byte_buffer().slice_ref(...), i.e. an Arc refcount bump + ByteBuffer construction on every element (no data copy, but atomic ops in a tight comparison loop). The old with_iterator yielded borrowed &[u8] with zero per-element overhead.

Notably, the zstd (collect_valid_vbv) and onpair rewrites in this same PR avoid exactly this by walking views() + cached data_buffers() and borrowing view.as_inlined().value() / &buffers[idx][range]. min_max/is_sorted should use the same pattern — they feed statistics computation and are more "hot path" than the PR description implies. Fix this →

2. New all-null onpair path is untested

onpair_compress_varbinview adds a mask.all_false() early-return that builds ConstantArray children and a hardcoded bits = 9:

if mask.all_false() {
    return OnPair::try_new(..., Validity::AllInvalid, 9);
}

The existing tests cover mixed nullability (test_onpair_nullable_canonicalize) but none exercise a fully-null input, so this new branch — including the magic 9 — is uncovered. Worth an all-null roundtrip test. Fix this →

3. Minor: lost length guard in onpair

In onpair_compress_varbinview the per-row length is now uncompressed_lengths.push(view.len() as i32). The old code used i32::try_from(bytes.len()).vortex_expect("string length must fit in i32"). The as i32 cast silently wraps for lengths > i32::MAX instead of failing loudly. Practically unreachable (view lengths are u32, 2 GB single strings are unrealistic), but the explicit guard was lost — consider keeping i32::try_from.

dict_bytes_to_buffer's rewrite to ByteBufferMut + push_n_unchecked is sound: capacity is reserved as dict_bytes.len() + MAX_TOKEN_SIZE, extend_from_slice writes dict_bytes.len(), and the unchecked push fills exactly the remaining MAX_TOKEN_SIZE bytes within capacity — matching the old resize semantics.

I could not run cargo build/clippy/nextest in this review environment, so the findings above are from static analysis only; CI should confirm compilation and the existing onpair/varbin test suites.
· rk/noarrayaccessor

codspeed-hq · 2026-06-26T13:53:18Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 10 regressed benchmarks
✅ 1574 untouched benchmarks
⏩ 4 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`take_map[(0.1, 1.0)]`	1.6 ms	2.4 ms	-30.95%
❌	Simulation	`take_map[(0.1, 0.5)]`	987.2 µs	1,380.1 µs	-28.47%
❌	Simulation	`take_map[(0.01, 1.0)]`	1.2 ms	1.6 ms	-26.84%
❌	Simulation	`take_map[(0.01, 0.1)]`	161.8 µs	200 µs	-19.09%
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	169 µs	205.5 µs	-17.79%
❌	Simulation	`take_map[(0.01, 0.05)]`	106.2 µs	124.5 µs	-14.71%
❌	Simulation	`slice_empty_vortex`	339.4 ns	397.8 ns	-14.66%
❌	Simulation	`take_map[(0.01, 0.5)]`	603.1 µs	703.3 µs	-14.25%
❌	Simulation	`take_map[(0.1, 0.1)]`	455.3 µs	529.1 µs	-13.95%
❌	Simulation	`take_map[(0.005, 1.0)]`	1.1 ms	1.3 ms	-11.66%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	186.1 ns	+31.34%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	304.7 ns	246.4 ns	+23.68%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	398.6 ns	340.3 ns	+17.14%
⚡	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	259.7 µs	224.5 µs	+15.7%
⚡	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	306.5 µs	271.1 µs	+13.05%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing rk/noarrayaccessor (ec21168) with develop (f172d58)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

robert3005 · 2026-06-26T14:12:33Z

Need to fix min_max/is_sorted just noticed it while I was going through the changes

Signed-off-by: Robert Kruszewski <github@robertk.io>

Remove ArrayAccessor

d8f0001

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 requested review from a team, 0ax1, AdamGS and onursatici June 26, 2026 13:34

robert3005 added the changelog/chore A trivial change label Jun 26, 2026

AdamGS approved these changes Jun 26, 2026

View reviewed changes

0ax1 approved these changes Jun 26, 2026

View reviewed changes

robert3005 added 3 commits June 26, 2026 15:24

fixes

6001592

Signed-off-by: Robert Kruszewski <github@robertk.io>

less

1edb7f8

Signed-off-by: Robert Kruszewski <github@robertk.io>

format

ec21168

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 enabled auto-merge (squash) June 26, 2026 14:35

robert3005 merged commit ba00e83 into develop Jun 26, 2026
71 of 72 checks passed

robert3005 deleted the rk/noarrayaccessor branch June 26, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove ArrayAccessor#8603

Remove ArrayAccessor#8603
robert3005 merged 4 commits into
developfrom
rk/noarrayaccessor

robert3005 commented Jun 26, 2026

Uh oh!

0ax1 commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

robert3005 commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

robert3005 commented Jun 26, 2026

Uh oh!

0ax1 commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Remove ArrayAccessor

1. Performance regression in min_max / is_sorted over strings (medium)

2. New all-null onpair path is untested

3. Minor: lost length guard in onpair

Uh oh!

codspeed-hq Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

robert3005 commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Review: Remove `ArrayAccessor`

1. Performance regression in `min_max` / `is_sorted` over strings (medium)

codspeed-hq Bot commented Jun 26, 2026 •

edited

Loading