Skip to content

Cursor-based sequential iteration for Index trait#105

Open
antiguru wants to merge 22 commits intofrankmcsherry:masterfrom
antiguru:feature/cursor-index
Open

Cursor-based sequential iteration for Index trait#105
antiguru wants to merge 22 commits intofrankmcsherry:masterfrom
antiguru:feature/cursor-index

Conversation

@antiguru
Copy link
Copy Markdown
Contributor

@antiguru antiguru commented Apr 16, 2026

Problem

Repeats<TC>::get(index) calls RankSelect::rank(index) on every access — a popcount over multiple u64 words.
For sequential iteration this recomputes bit counts from scratch per element.
The same overhead hits Lookbacks.
For composite types (tuples, #[derive(Columnar)] structs) containing these, the cost multiplies.

Design

Extend Index with a GAT cursor:

pub trait Index {
    type Ref;
    type Cursor<'a>: Iterator<Item = Self::Ref> where Self: 'a;
    fn get(&self, index: usize) -> Self::Ref;
    fn cursor(&self, range: Range<usize>) -> Self::Cursor<'_>;
    fn index_iter(&self) -> Self::Cursor<'_> where Self: Len { self.cursor(0..self.len()) }
}
  • Each type provides a specialized cursor where worthwhile; others fall back to DefaultCursor (wraps get()).
  • Slices use core::slice::Iter for pointer-based iteration.
  • Tuples and derived structs generate composed cursors — TupleCursorN<A::Cursor, B::Cursor, ...> / FooContainerCursor<'a, ...> — that zip field cursors. Specialized cursors propagate through composites automatically.
  • RepeatsCursor / LookbacksCursor seed somes_cursor / oks_cursor with one rank() at range start, then maintain incrementally: bit test + conditional increment per element.
  • Bit fetches cache the current bitvector word; only re-fetch on word boundary (every 64 elements), avoiding the per-element bounds check into &[u64].
  • New CursorOf<'a, C> type alias (<C as Index>::Cursor<'a>) for naming iterator types in trait associated-type slots.

Trade-offs

  • Ref-form impls (&'a Repeats, &'a Lookbacks, &'a tuple, &'a Slice, derive ref form) use DefaultCursor. Composed cursors need lifetime gymnastics around intermediate references that don't work through a generic impl. The hot path — container.borrow().index_iter() — hits the owned-form specialized cursor.
  • index_iter return type changes from IterOwn<&Self> to Self::Cursor<'_> — semver-major. Callers iterating with for x in c.index_iter() keep working; callers naming IterOwn<&Self> break. Item types can shift from &T to T depending on method dispatch (reflected in updated sum tests).
  • container.borrow().index_iter() pattern has a lifetime limitation. The cursor borrows from the Borrowed shell, which is a temporary in that expression. Callers who need to return an iterator from a method where the Borrowed can't be stashed (e.g. fn drain(&mut self) -> Self::DrainIter<'_>) should use into_index_iter — it consumes the Copy Borrowed shell, giving an IterOwn<Borrowed<'a>> whose lifetime ties to the inner refs. Slow path, but semantically correct. A cleaner fix (cursors holding Copy inner refs directly) would require either specialization or dropping owned-form cursors; out of scope for this PR.
  • Enums keep DefaultCursor. Composed dispatch across variants with per-element discriminants is non-trivial; not worth the surface-area bump for this PR.

Performance

All numbers from benches in this PR. Machine-dependent; relative improvements are the signal.

Repeats / Lookbacks direct iteration (100k elements, benches/repeats_cursor.rs):

Method Time Throughput vs baseline
repeats_get 1094 µs 730 MB/s baseline
repeats_cursor 92 µs 8647 MB/s 11.8x
lookbacks_get 1123 µs 712 MB/s baseline
lookbacks_cursor 121 µs 6577 MB/s 9.2x

Both types eliminate per-element rank(); remaining cost is bit shift + one bounds-checked array access into somes / oks.

Derived struct with a Repeats field (100k rows, benches/derived_cursor.rs):

Method Time Throughput vs baseline
derived_get (per-elem) 1228 µs 1303 MB/s baseline
derived_cursor (composed) 107 µs 15010 MB/s 11.5x
tuple_cursor (reference) 110 µs 14511 MB/s 11.2x

derived_cursor matches tuple_cursor — the composed cursor lets derived structs benefit identically to hand-written tuples.

Options/Results (100k elements, benches/options_cursor.rs):

Method Time vs baseline
options_get 852 µs baseline
options_cursor 851 µs ~same
results_get 1073 µs baseline
results_cursor 1148 µs ~same

Options/Results see no change when iterated directly: get() cost is dominated by the somes/oks array access, not rank().
They still benefit indirectly when they appear inside composites — the Repeats / Lookbacks fields get specialized iteration via the composed cursor.

Profile after word caching (Materialize workload):

  • One rank() per cursor construction, zero per element.
  • IPC ~2.9, branch miss rate 0.5%.
  • Remaining per-element cost: bit shift + somes.get() bounds-checked array access.

What changed

  • Extended Index trait: Cursor GAT, cursor(), index_iter unified to use it.
  • DefaultCursor + impl_default_cursor! macro; ~35 container impls adopt default cursor.
  • RepeatsCursor + LookbacksCursor with word caching.
  • Tuple macro generates TupleCursorN structs.
  • columnar_derive generates FooContainerCursor<'a, ...> for structs (enums keep default).
  • CursorOf<'a, C> type alias.
  • into_index_iter kept as slow-path escape hatch for consuming iteration.
  • New benches, tests covering word boundaries and empty ranges.

🤖 Generated with Claude Code

antiguru and others added 9 commits April 16, 2026 13:39
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type Cursor<'a>: Iterator and fn cursor(Range) to the Index trait.
Provide DefaultCursor fallback and impl_default_cursor! macro.
Primitive slice/vec impls use core::slice::Iter for bounds-check-free
iteration. Slice delegates to inner container's cursor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate TupleCursorN structs that zip field cursors. Each tuple's
cursor() composes field cursors, enabling specialized iteration
(e.g. rank-free Repeats) through composite types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RepeatsCursor maintains somes_cursor incrementally — one bit test per
element, no rank() calls after the initial seed. LookbacksCursor does
the same for oks_cursor, resolving Err back-references without rank().

Reference-to-owned impls (&'a Repeats, &'a Lookbacks, &'a tuple, &'a Slice)
use DefaultCursor since their cursor can't outlive intermediate borrows.
The hot path (borrowed forms) uses the owned-generic impls with specialized
cursors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cache the current u64 word from the bitvector so only one fetch
per 64 elements instead of per element. Eliminates 1 bounds check +
1 branch (values vs tail) per next() call, replacing them with a
branch predicted taken 63/64 times.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru added a commit to antiguru/materialize that referenced this pull request Apr 16, 2026
Picks up Index::cursor support (frankmcsherry/columnar#105) needed for
efficient Repeats iteration in factorized columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru added a commit to antiguru/materialize that referenced this pull request Apr 17, 2026
Picks up Index::cursor support (frankmcsherry/columnar#105) needed for
efficient Repeats iteration in factorized columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru added a commit to antiguru/materialize that referenced this pull request Apr 20, 2026
Picks up Index::cursor support (frankmcsherry/columnar#105) needed for
efficient Repeats iteration in factorized columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru added a commit to antiguru/materialize that referenced this pull request Apr 22, 2026
Picks up Index::cursor support (frankmcsherry/columnar#105) needed for
efficient Repeats iteration in factorized columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru and others added 13 commits April 22, 2026 15:13
Generate a per-type cursor struct that zips field cursors, matching
the tuple macro pattern. Struct fields with specialized cursors (e.g.
Repeats) are now reached through derived containers.

Bench (100k rows, Repeats key + Vec val):
- derived get():          1,210 µs  (unchanged, baseline)
- derived cursor (before):  1,146 µs  (DefaultCursor — same cost)
- derived cursor (after):      99 µs  (11.6x faster)
- tuple cursor:              110 µs  (reference point)

Ref form (&'columnar #c_ident) keeps DefaultCursor — same borrow issue
as &'a tuple. Enums still use DefaultCursor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
index_iter() now returns Self::Cursor<'_> instead of IterOwn<&Self>.
Callers get the specialized cursor path automatically — Repeats /
Lookbacks / derived structs / tuples compose rank-free or
pointer-based iteration without changing call sites.

cursor_iter() removed as redundant; migrate to index_iter().

Options round-trip tests adjusted: method resolution now picks
Options::index_iter (owned form) via auto-deref, yielding owned
Option<T> items instead of Option<&T> via the &Options impl. Test
comparisons updated accordingly.

into_index_iter(self) / IterOwn<Self> kept unchanged — owned
iteration still uses the slow get() path; uncommon hot path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Derived cursor gains size_hint and ExactSizeIterator (matching the
  tuple cursor for consistent use with .len() / .zip() callers).
- Word-boundary test coverage: cursor_matches_get now probes indices
  near 64, 128, 192, 256 to catch regressions in word caching.
- Debug asserts guard against underflow in the None/Err branches
  (relies on container invariants: Repeats starts with Some,
  Lookbacks starts with Ok).

Known non-fixes documented in PR review:
- &'a Repeats / &'a Lookbacks / &'a tuple / &'a derived container
  still use DefaultCursor: lifetime gymnastics in the ref form are
  deferred. Hot path goes through container.borrow().index_iter()
  which hits the owned-form specialized cursor.
- into_index_iter(self) keeps IterOwn slow path (owned consumption
  is uncommon).
- index_iter() return type change is a breaking API change; will
  need a version bump when released.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
columnar 0.12's index_iter signature (IterOwn<&Self>) required
&&[T]: Index for the borrowed-tuple case, which wasn't implemented.
Callers had to use into_index_iter() consuming the Copy Borrowed.

With cursor-based index_iter, .borrow().index_iter() compiles and
uses the composed cursor path — no &&[T]: Index bound needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owned iteration had no specialization path — returned IterOwn<Self>
which always used the slow get()-based path. Callers consumed self
unnecessarily; auto-ref to ref-form gave the same slow result as
index_iter(). Removing the method forces callers to either borrow
(container.borrow().index_iter() — fast composed cursor) or keep
using IterOwn directly via Slice::into_iter for the one in-tree
use case (Vec<T>::Columnar impls).

IterOwn is preserved as an implementation detail of Slice::into_iter,
which is widely used by derived Columnar impls.

Migration:
- columns.into_index_iter() -> columns.borrow().index_iter()
- Where Container: Index bounds don't hold (e.g. non-Copy field
  containers), borrow() produces a Container with all-Copy inner
  refs that satisfies Index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
100k elements, iter-then-sum:
- repeats_get     1094µs
- repeats_cursor    92µs   (11.8x)
- lookbacks_get   1123µs
- lookbacks_cursor 121µs   (9.2x)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain why Ref is deliberately not a GAT (borrowed containers return
references at their own lifetime, outliving the &self borrow) while
Cursor<'a> is a GAT (iteration state is intrinsically &self-bound).
Note the composition principle behind tuple / derived-struct cursors,
and the ref-form DefaultCursor fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- bytes.rs: unresolved intra-doc link to `indexed` (needs full path)
- adts/art.rs: bare URL needs angle brackets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parallel to ContainerOf<T>, BorrowedOf<'a, T>, Ref<'a, T>.
Resolves to the cursor type of the borrowed container:
<BorrowedOf<'a, T> as Index>::Cursor<'a>.

Useful for naming iterator types in trait associated-type slots,
e.g. timely's DrainContainer::DrainIter<'a>.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`///` doc comments inside `quote!` are literal — `#names` did not
substitute, so the generated rustdoc read literally "Container for
#names." Switch to `#[doc = ...]` with pre-formatted strings.

Fixes container_struct, reference_struct, and cursor_struct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Callers with only the container in scope (no T: Columnar) can't use the
T-based alias. Index is the fundamental abstraction; parameterize the
alias on any C: Index. Columnar users compose with BorrowedOf:
CursorOf<'a, BorrowedOf<'a, Foo>>.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cursor's lifetime is bound to the container's shell, so the common
pattern `self.borrow().index_iter()` creates a cursor borrowing from
the temporary Borrowed shell. For callers that need to return an
iterator whose lifetime is tied to the borrowed value's inner refs
(e.g. timely's DrainContainer), into_index_iter + IterOwn gives them
an owned iterator that consumes the Copy Borrowed shell — slow path
but semantically correct.

Revert callers in src/main.rs and benches/ops.rs to original form.
Update CHANGELOG accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant