feat(lance-select): expose selected rows accessor on NullableRowAddrSet#7164
Merged
Conversation
Add `selected_rows()` returning `&RowAddrTreeMap` so external consumers (e.g. wire-serialization codecs) can read the union of TRUE and NULL rows without paying for the clone+subtract that `true_rows()` performs. Internal cleanups that ride on the same field: - `union_all` now unions `&s.selected` directly instead of materializing each `s.true_rows()` first. The `nulls` cleanup uses an explicit `any_true` set (still equivalent to the prior algorithm). - `PartialEq` compares `(selected, nulls)` field-wise — no clones. Adds 9 unit tests covering union_all edge cases: TRUE-overrides-NULL conflicts, NULL-only inputs, empty/single/disjoint inputs, and a cross-check against repeated `BitOrAssign`.
- Revert PartialEq to semantic equality via true_rows(); raw-field equality is wrong because a NULL row may be represented either inside or outside the `selected` bitmap. Add regression test exercising the two representations of "row 5 is NULL". - Tighten selected_rows() doc: it returns the raw backing bitmap, not a semantic "TRUE ∪ NULL" set.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
westonpace
approved these changes
Jun 9, 2026
westonpace
left a comment
Member
There was a problem hiding this comment.
I'm fine merging this but the PR description and title are a little confusing. Is the goal to speedup wire transfers? Or is the goal to speedup the union_all method? I think this does both?
Contributor
Author
|
@westonpace yep it does both |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
true_rows()clones bothselectedandnullsto compute their difference. For wire serialization (e.g. distributed scalar index execution), we want to keep the originalselectedandnullsas intermediate result and sent them across nodes.Expose
selected_rows() -> &RowAddrTreeMapfor a zero-copy view of the raw backing bitmap. Profiles on distributed scalar index queries showRoaringBitmap::cloneis a measurable hot spot and this improves performance by reducing the need for cloning them.