feat(vector)!: add approx mode for RaBitQ search#7179
Merged
Conversation
Contributor
|
Important This PR touches the Lance format specification. Substantive changes to the format specification — the If this is a meaningful format change:
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Xuanwo
reviewed
Jun 9, 2026
Xuanwo
approved these changes
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Feature
This PR adds a public
approx_modesetting for vector search withfast,normal, andaccuratevalues. The public API avoids exposing RaBitQ/HACC terminology while still allowing callers to choose the speed/accuracy tradeoff when the backing index supports it.Implementation
ApproxModeto vector queries and threads it through Rust scanner, ANN proto serialization, Python query parsing, and FlatIndex distance calculator options.fast: force 1-bit query-time scoring for RaBitQ.normal: preserve the existing search path.accurate: use a 16-bit LUT accumulation path for the binary RaBitQ estimator.QueryScratchCapacity::new(...)API compatible.approx_mode="fast" | "normal" | "accurate"and rejects invalid values.Benchmark
IVF_RQ on dbpedia,
num_bits=5, norefine_factor. The plot showsapprox_mode=fast,normal, andaccurateas separate curves.Breaking Change
BREAKING CHANGE: The ANN protobuf schema now includes
VectorApproxMode approx_modeon vector query serialization. Consumers that regenerate or explicitly match Lance's serialized ANN query proto should update to the new schema.Validation
cargo fmt --allcargo test -p lance-linalg simd::dist_tablecargo test -p lance-index vector::bq::storagecargo test -p lance-index vector::storage::testscargo test -p lance --features substrait test_query_roundtripcargo test -p lance test_knn_approx_mode_defaults_and_settercargo check --workspace --tests --benchescd python && make installcd python && uv run pytest python/tests/test_vector_index.py::test_vector_index_with_approx_mode python/tests/test_vector_index.py::test_vector_index_invalid_approx_modecd python && uv run pytest python/tests/test_vector_index.py::test_create_ivf_rq_multi_bit_searches_l2_and_cosinegit diff --check