feat: add metadata bridge for Python CUDA export by 0ax1 · Pull Request #8604 · vortex-data/vortex

0ax1 · 2026-06-26T13:43:56Z

Add a private metadata-only bridge on PyVortex arrays so the optional CUDA extension can reconstruct arrays through its own local Vortex session instead of passing Rust ArrayRef values across Python extension modules.

The CUDA extension now parses the metadata tree, calls local array plugins for deserialization, and exports bufferless arrays through the Arrow C Device capsule path. Physical buffer handoff remains a follow-up.

codspeed-hq · 2026-06-26T13:46:29Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 1 regressed benchmark
✅ 1583 untouched benchmarks
⏩ 4 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	169 µs	205.7 µs	-17.87%
⚡	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	259.7 µs	224.6 µs	+15.65%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	215.3 ns	+13.55%
⚡	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	306.5 µs	271.3 µs	+12.97%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	304.7 ns	275.6 ns	+10.58%
⚡	Simulation	`encode_varbin[(1000, 2)]`	158.4 µs	143.5 µs	+10.34%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ad/pycudf (f827d8a) with develop (f172d58)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Add a private metadata-only bridge on PyVortex arrays so the optional CUDA extension can reconstruct arrays through its own local Vortex session instead of passing Rust ArrayRef values across Python extension modules. The CUDA extension now parses the metadata tree, calls local array plugins for deserialization, and exports bufferless arrays through the Arrow C Device capsule path. The capsule ownership code handles live and consumed capsule names and releases Arrow resources on error paths. Physical buffer handoff remains a follow-up. Signed-off-by: "Alexander Droste" <alexander.droste@protonmail.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 · 2026-06-26T14:23:00Z

@claude review in depth

github-actions · 2026-06-26T14:23:21Z

Claude finished @0ax1's task in 3m 35s —— View job

Claude PR Review — metadata bridge for Python CUDA export

I reviewed the full diff (vortex-python/src/arrays/mod.rs, the new vortex-python-cuda/src/lib.rs, Python glue, and tests) against origin/develop. Overall this is careful, high-quality FFI code — the capsule lifetime/ownership handling, error-path resource release, and post-deserialize validation are all well done. I did not find any memory-safety or correctness bugs. Findings below are mostly minor / design notes.

Tasks

Gather context & read changed files
Review vortex-python/src/arrays/mod.rs metadata bridge
Review vortex-python-cuda/src/lib.rs (capsule/FFI, unsafe)
Review Python glue + tests
Post review feedback

What's good

Capsule ownership is correct on every path. In export_device_array, if device_array_capsule fails after schema_capsule succeeded, the dropped schema Bound runs release_schema_capsule, so no leak. The schema_capsule/device_array_capsule constructors also reclaim the Box and call the Arrow release callback if PyCapsule::new_* itself fails. The check_requested_schema error path calls release_exported for both structs. Double-free is avoided because the underlying release_schema/release_device_array are guarded by if let Some(release) and rely on the standard Arrow contract of nulling the callback.
deserialize_metadata_tree validates aggressively — length, dtype, and is_supported_encoding after deserialization, plus a per-node buffer_count != 0 rejection. I confirmed nbuffers() is per-node (there's a separate nbuffers_recursive() in vortex-array/src/array/erased.rs), so the per-node check is semantically correct and recursion catches buffers in any child.
Concurrency is fine. static RUNTIME: CurrentThreadRuntime wraps Arc<smol::Executor<'static>>; block_on(executor.run(fut)) supports concurrent runners, so sharing it across py.detach calls from multiple Python threads is safe. The unsafe impl Send for ExportedDeviceArray is justified and clearly documented.
Tuple shape (len == 6) is validated on the Rust side and matches the producer in array_metadata_tuple and the .pyi annotation.

Findings / suggestions

1. Debug helper exercises a different session than the real export (vortex-python-cuda/src/lib.rs).
_debug_array_metadata_dtype deserializes through METADATA_SESSION (plain default), while export_device_array uses cuda_session(). As a debugging aid for the export path this is slightly misleading — an array that round-trips through _debug_array_metadata_dtype could still fail under the CUDA session if encoding registries diverge. Consider documenting that the debug helper validates only the metadata-bridge decode, not the CUDA export path. Fix this →

2. Cross-session encoding registry mismatch is a real behavioral limitation (by design).
The whole point of the bridge is reconstructing arrays via the extension's own local session. That means any encoding registered only in the base vortex-data session (e.g. a user-registered Python encoding) will fail in the CUDA extension with Unknown array encoding: .... This is the intended trade-off, but worth a sentence in the __vortex_array_metadata__ docstring (vortex-python/src/arrays/mod.rs) so consumers understand only encodings known to the local session are reconstructable.

3. reject_unsupported_kwargs silently accepts key=None.
Any unknown kwarg whose value is None is ignored rather than rejected, so a typo like requsted_schema=None (note the misspelling) would be silently swallowed instead of raising. If that leniency is intentional (matching the Arrow capsule protocol), a one-line comment would help; otherwise consider rejecting unknown names regardless of value. Fix this →

4. New public surface in vortex-cuda (vortex-cuda/src/arrow/mod.rs).
release_schema and release_device_array went from private to pub. They're needed across the crate boundary, so pub is justified, but these are low-level FFI release primitives that are easy to misuse on arbitrary structs. Consider #[doc(hidden)] or a clearly-internal module path to signal they're not part of the supported public API.

5. Test coverage (acknowledged in comments). The added tests cover the metadata round-trip, the buffer-handoff rejection, and the no-CUDA error path well. The capsule destructor and used_* rename paths can't run in CI without a CUDA device + Arrow Device consumer — the code comments call this out. No action required, but a follow-up that exercises the schema-capsule destructor against a synthetic FFI_ArrowSchema (CPU-only) would close part of that gap if it's cheap to add.

Nits

dtype_from_metadata clones metadata.dtype into a ByteBuffer on every call; negligible, mentioning only for completeness.
parse_array_metadata / array_metadata_tuple recurse per node — fine in practice, theoretically stack-bound on pathological depth.

CI / checks

This is Python-binding + PyO3 Rust work. Per CLAUDE.md the relevant checks are cargo +nightly fmt --check -p vortex-python (and vortex-python-cuda), cargo clippy --all-targets --all-features, uv run basedpyright vortex-python, and ruff. I did not run these (read-only review). The CodSpeed regression on chunked_varbinview_into_canonical[(1000, 10)] is unrelated to this diff (no changes to that path) and looks like Walltime-runner noise per the CodSpeed warning.
· branch ad/pycudf

AdamGS

this makes sense to me

0ax1 requested a review from a team June 26, 2026 13:43

0ax1 added the changelog/feature A new feature label Jun 26, 2026

0ax1 force-pushed the ad/pycudf branch from fbd31f5 to c3d424c Compare June 26, 2026 13:45

0ax1 force-pushed the ad/pycudf branch from c3d424c to a4fbc2f Compare June 26, 2026 13:46

0ax1 requested review from AdamGS and robert3005 June 26, 2026 13:46

0ax1 force-pushed the ad/pycudf branch from a4fbc2f to 47f95c3 Compare June 26, 2026 14:00

robert3005 requested a review from gatesn June 26, 2026 14:10

0ax1 force-pushed the ad/pycudf branch from 47f95c3 to 3a0d737 Compare June 26, 2026 14:11

0ax1 force-pushed the ad/pycudf branch from 3a0d737 to f827d8a Compare June 26, 2026 14:21

AdamGS approved these changes Jun 26, 2026

View reviewed changes

robert3005 approved these changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add metadata bridge for Python CUDA export#8604

feat: add metadata bridge for Python CUDA export#8604
0ax1 wants to merge 1 commit into
developfrom
ad/pycudf

0ax1 commented Jun 26, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

0ax1 commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

AdamGS left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

0ax1 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

0ax1 commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review — metadata bridge for Python CUDA export

Tasks

What's good

Findings / suggestions

Nits

CI / checks

Uh oh!

AdamGS left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0ax1 commented Jun 26, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading