perf(duckdb): push down list length expressions#8544
Conversation
Merging this PR will improve performance by 11.47%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
169 µs | 205.5 µs | -17.76% |
| ⚡ | Simulation | copy_nullable[65536] |
1.4 ms | 1 ms | +32.08% |
| ⚡ | Simulation | copy_non_nullable[65536] |
1,089.1 µs | 908.3 µs | +19.91% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.7 µs | 224.5 µs | +15.67% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.5 µs | 271.1 µs | +13.06% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
| 🆕 | Simulation | list_large |
N/A | 10 ms | N/A |
| 🆕 | Simulation | list_medium |
N/A | 144 µs | N/A |
| 🆕 | Simulation | list_small |
N/A | 58.8 µs | N/A |
| 🆕 | Simulation | listview_large |
N/A | 6 ms | N/A |
| 🆕 | Simulation | listview_medium |
N/A | 98.1 µs | N/A |
| 🆕 | Simulation | listview_small |
N/A | 39 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mk/duckdb-list-length-pushdown (42ddb5d) with develop (9567467)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
myrrc
left a comment
There was a problem hiding this comment.
Overall, PR LGTM sans some small comments.
Please add sqllogictests in slt/duckdb before merging.
In the sqllogic tests, please also add tests where
array length (for same and different columns) is queried both in SELECT and WHERE.
5f14a76 to
a9e4773
Compare
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
a9e4773 to
42ddb5d
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.045x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.045x ➖, 0↑ 0↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.007x ➖, 0↑ 0↓)
datafusion / parquet (1.000x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.979x ➖, 0↑ 0↓)
duckdb / parquet (0.991x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.975x ➖, 1↑ 0↓)
datafusion / parquet (0.989x ➖, 0↑ 0↓)
datafusion / arrow (1.009x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.973x ➖, 0↑ 0↓)
duckdb / parquet (0.995x ➖, 2↑ 1↓)
File Size Changes (17 files changed, -44.4% overall, 5↑ 12↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.009x ➖, 0↑ 2↓)
datafusion / parquet (1.003x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.008x ➖, 0↑ 1↓)
duckdb / parquet (1.006x ➖, 2↑ 3↓)
File Size Changes (30 files changed, -43.4% overall, 4↑ 26↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.762x ➖, 1↑ 0↓)
datafusion / parquet (0.780x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.871x ➖, 0↑ 0↓)
duckdb / parquet (0.923x ➖, 0↑ 0↓)
|
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.937x ➖, 4↑ 1↓)
datafusion / parquet (0.931x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.038x ➖, 0↑ 2↓)
duckdb / parquet (0.981x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.6% overall, 55↑ 146↓)
Totals:
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.930x ➖, 2↑ 0↓)
duckdb / parquet (0.939x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 1↑ 2↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.984x ➖, 3↑ 1↓)
datafusion / parquet (1.009x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (0.980x ➖, 1↑ 0↓)
duckdb / parquet (0.987x ➖, 0↑ 1↓)
File Size Changes (201 files changed, -39.1% overall, 52↑ 149↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.016x ➖, 0↑ 0↓)
datafusion / parquet (1.013x ➖, 0↑ 0↓)
datafusion / arrow (1.037x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.021x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
File Size Changes (47 files changed, -44.5% overall, 6↑ 41↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.875x ➖, 4↑ 0↓)
datafusion / parquet (0.741x ➖, 8↑ 0↓)
duckdb / vortex-file-compressed (0.910x ➖, 0↑ 0↓)
duckdb / parquet (0.887x ➖, 0↑ 0↓)
|
|
|
||
| # Mixing the len/length/array_length aliases across SELECT and WHERE. | ||
| query I | ||
| SELECT len(a) FROM '$__TEST_DIR__/list-length.vortex' |
There was a problem hiding this comment.
Let's add EXPLAIN for all queries in the test to test pushdown happened
Pushes DuckDB's list-length scalar function into the Vortex scan as the
list_lengthexpression, so lengths are computed from list offsets/sizes without materializing element values.Pushdowns supported:
SELECT len(list)/length(list)/array_length(list))WHERE array_length(list) >= k, alsolen/length)Each maps to
cast(list_length(col), i64)— DuckDB'slen/array_lengthreturnBIGINTwhilelist_lengthreturnsu64.len/lengthare overloaded with strings/bits, so the filter path needs the argument type to disambiguate. Added a small FFI accessorduckdb_vx_expr_get_return_typeplusExpressionRef::return_type(), and gatelen/length/array_lengthon the bound child beingLIST/ARRAY.Does not currently support
array_length(expr, dim).Stacked on #8495.