[python] Support native batch vector search in Lumina reader#8280
[python] Support native batch vector search in Lumina reader#8280XiaoHongbo-Hope wants to merge 2 commits into
Conversation
| @@ -191,32 +218,33 @@ def read_batch(self, splits): | |||
| return [GlobalIndexResult.create_empty() for _ in range(n)] | |||
|
|
|||
| pre_filter = self._pre_filter(splits) | |||
There was a problem hiding this comment.
This PR currently conflicts with master, and this part needs a careful rebase rather than just taking the PR-side implementation. The current master split read_batch into indexed and raw paths: indexed splits are searched through _eval, raw splits are still scanned via _read_raw_search, and the two results are merged per query. This new native-batch path should be applied only to the indexed splits, using the per-index-split pre-filters, then still merge each query result with the raw fallback. Otherwise batch vector search will either drop raw-split rows or try to treat RawVectorSearchSplit as an indexed split.
There was a problem hiding this comment.
This PR currently conflicts with master, and this part needs a careful rebase rather than just taking the PR-side implementation. The current master split
read_batchinto indexed and raw paths: indexed splits are searched through_eval, raw splits are still scanned via_read_raw_search, and the two results are merged per query. This new native-batch path should be applied only to the indexed splits, using the per-index-split pre-filters, then still merge each query result with the raw fallback. Otherwise batch vector search will either drop raw-split rows or try to treatRawVectorSearchSplitas an indexed split.
Fixed
0f7136f to
2e656e6
Compare
Batch vector search looped single searches (one search_list call per query vector). Route it through Lumina's native batch entry: flatten the n query vectors into one (n * dim) buffer, call search_list / search_with_filter_list once, and slice each query's hits from [q * k, q * k + k). Readers without a native batch path fall back to per-query fan-out. read_batch applies the native batch only to index splits (with each split's pre-filter), merges results per query across index splits, then merges each query with the raw (brute-force) search fallback. BatchVectorSearch normalizes list vectors to float32 to match VectorSearch.
2e656e6 to
fe2d541
Compare
Purpose
Tests