Skip to content

[python] Support native batch vector search in Lumina reader#8280

Open
XiaoHongbo-Hope wants to merge 2 commits into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix
Open

[python] Support native batch vector search in Lumina reader#8280
XiaoHongbo-Hope wants to merge 2 commits into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix

Conversation

@XiaoHongbo-Hope

Copy link
Copy Markdown
Contributor

Purpose

Tests

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review June 21, 2026 12:41
@@ -191,32 +218,33 @@ def read_batch(self, splits):
return [GlobalIndexResult.create_empty() for _ in range(n)]

pre_filter = self._pre_filter(splits)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR currently conflicts with master, and this part needs a careful rebase rather than just taking the PR-side implementation. The current master split read_batch into indexed and raw paths: indexed splits are searched through _eval, raw splits are still scanned via _read_raw_search, and the two results are merged per query. This new native-batch path should be applied only to the indexed splits, using the per-index-split pre-filters, then still merge each query result with the raw fallback. Otherwise batch vector search will either drop raw-split rows or try to treat RawVectorSearchSplit as an indexed split.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR currently conflicts with master, and this part needs a careful rebase rather than just taking the PR-side implementation. The current master split read_batch into indexed and raw paths: indexed splits are searched through _eval, raw splits are still scanned via _read_raw_search, and the two results are merged per query. This new native-batch path should be applied only to the indexed splits, using the per-index-split pre-filters, then still merge each query result with the raw fallback. Otherwise batch vector search will either drop raw-split rows or try to treat RawVectorSearchSplit as an indexed split.

Fixed

@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the batch_search_fix branch 2 times, most recently from 0f7136f to 2e656e6 Compare June 22, 2026 11:34
Batch vector search looped single searches (one search_list call per
query vector). Route it through Lumina's native batch entry: flatten the
n query vectors into one (n * dim) buffer, call search_list /
search_with_filter_list once, and slice each query's hits from
[q * k, q * k + k). Readers without a native batch path fall back to
per-query fan-out.

read_batch applies the native batch only to index splits (with each
split's pre-filter), merges results per query across index splits, then
merges each query with the raw (brute-force) search fallback.
BatchVectorSearch normalizes list vectors to float32 to match VectorSearch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants