Tune varbinview compaction for Arrow export#8585
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.024x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.024x ➖, 0↑ 1↓)
No file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.976x ➖, 0↑ 0↓)
datafusion / parquet (0.990x ➖, 1↑ 0↓)
datafusion / arrow (0.950x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (1.034x ➖, 1↑ 2↓)
duckdb / parquet (0.979x ➖, 1↑ 0↓)
File Size Changes (18 files changed, -44.4% overall, 4↑ 14↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.974x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.001x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.005x ➖, 1↑ 6↓)
datafusion / parquet (1.005x ➖, 2↑ 4↓)
duckdb / vortex-file-compressed (1.015x ➖, 10↑ 16↓)
duckdb / parquet (1.001x ➖, 1↑ 1↓)
File Size Changes (37 files changed, -43.5% overall, 7↑ 30↓)
Totals:
|
Merging this PR will improve performance by ×68
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
16.3 µs | 26.9 µs | -39.53% |
| ⚡ | Simulation | compact[(16384, 90)] |
1,772,802.8 ns | 842.5 ns | ×2,100 |
| ⚡ | Simulation | compact_sliced[(16384, 90)] |
521,565.8 ns | 842.5 ns | ×620 |
| ⚡ | Simulation | compact[(4096, 90)] |
470,651.1 ns | 808.3 ns | ×580 |
| ⚡ | Simulation | compact_sliced[(4096, 90)] |
144,019.2 ns | 779.2 ns | ×180 |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 244.4 ns | +11.93% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/arrow-varbinview-compact-export (362c0c1) with develop (aeae579)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
duckdb / parquet (1.017x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.965x ➖, 3↑ 1↓)
datafusion / parquet (1.018x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.954x ➖, 2↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.5% overall, 52↑ 149↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.200x ➖, 0↑ 3↓)
datafusion / parquet (1.249x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (1.070x ➖, 1↑ 2↓)
duckdb / parquet (1.069x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.019x ➖, 0↑ 1↓)
datafusion / parquet (1.020x ➖, 0↑ 0↓)
datafusion / arrow (1.036x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.029x ➖, 0↑ 0↓)
duckdb / parquet (1.001x ➖, 0↑ 0↓)
File Size Changes (48 files changed, -44.4% overall, 8↑ 40↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.916x ➖, 10↑ 1↓)
datafusion / parquet (0.939x ➖, 6↑ 0↓)
duckdb / vortex-file-compressed (0.897x ✅, 15↑ 1↓)
duckdb / parquet (0.986x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -39.1% overall, 57↑ 144↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.638x ❌, 0↑ 19↓)
datafusion / parquet (1.383x ❌, 0↑ 12↓)
duckdb / vortex-file-compressed (1.212x ➖, 0↑ 6↓)
duckdb / parquet (1.164x ➖, 0↑ 4↓)
|
Summary
Compact
VarBinViewArraybefore exporting Arrow byte-view arrays so filtered arrays do not retain unselected backing buffers. This keeps ArrowUtf8View/BinaryViewexports from holding onto large payload buffers that no selected row references.