Skip to content

feat(lance-select): expose selected rows accessor on NullableRowAddrSet#7164

Merged
LuQQiu merged 3 commits into
lance-format:mainfrom
LuQQiu:exposeNullable
Jun 9, 2026
Merged

feat(lance-select): expose selected rows accessor on NullableRowAddrSet#7164
LuQQiu merged 3 commits into
lance-format:mainfrom
LuQQiu:exposeNullable

Conversation

@LuQQiu

@LuQQiu LuQQiu commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

true_rows() clones both selected and nulls to compute their difference. For wire serialization (e.g. distributed scalar index execution), we want to keep the original selected and nulls as intermediate result and sent them across nodes.

Expose selected_rows() -> &RowAddrTreeMap for a zero-copy view of the raw backing bitmap. Profiles on distributed scalar index queries show RoaringBitmap::clone is a measurable hot spot and this improves performance by reducing the need for cloning them.

Add `selected_rows()` returning `&RowAddrTreeMap` so external consumers
(e.g. wire-serialization codecs) can read the union of TRUE and NULL
rows without paying for the clone+subtract that `true_rows()` performs.

Internal cleanups that ride on the same field:
- `union_all` now unions `&s.selected` directly instead of materializing
  each `s.true_rows()` first. The `nulls` cleanup uses an explicit
  `any_true` set (still equivalent to the prior algorithm).
- `PartialEq` compares `(selected, nulls)` field-wise — no clones.

Adds 9 unit tests covering union_all edge cases: TRUE-overrides-NULL
conflicts, NULL-only inputs, empty/single/disjoint inputs, and a
cross-check against repeated `BitOrAssign`.
@github-actions github-actions Bot added the enhancement New feature or request label Jun 8, 2026
@LuQQiu LuQQiu closed this Jun 8, 2026
@LuQQiu LuQQiu reopened this Jun 8, 2026
- Revert PartialEq to semantic equality via true_rows(); raw-field equality
  is wrong because a NULL row may be represented either inside or outside
  the `selected` bitmap. Add regression test exercising the two
  representations of "row 5 is NULL".
- Tighten selected_rows() doc: it returns the raw backing bitmap, not a
  semantic "TRUE ∪ NULL" set.
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.70330% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-select/src/mask/nullable.rs 96.70% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@LuQQiu LuQQiu requested a review from wjones127 June 8, 2026 22:08

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine merging this but the PR description and title are a little confusing. Is the goal to speedup wire transfers? Or is the goal to speedup the union_all method? I think this does both?

@LuQQiu

LuQQiu commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@westonpace yep it does both
to remove as many clone as possible is the workload

@LuQQiu LuQQiu merged commit 8a12c1a into lance-format:main Jun 9, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants