[VOTE] Add index segment_seq to the Lance table format #7044
Replies: 1 comment 3 replies
-
|
@jackye1995 Is there a design doc or prototype for discussion anywhere? I couldn't find one.
Why can't they see this now? They should be able to just compare the table versions a segment is present in and tell immediately from that, right? Could you give a more concrete example? I think that would help me understand the use case better. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal
Add optional
segment_seqmetadata to physical index segments, plus a per-logical-index high-water mark, to the Lance table format.segment_seqis scoped to the index name, starts at 1 for each index name, and increases monotonically each time a new physical segment for that same index name is committed. Different index names have independent sequence domains, sovector_idxandscalar_idxcan both have segment sequence 1.PR: #7048
This thread is both the design discussion and the vote thread. If the design sounds good, a
+1here is enough; we do not need to split this into two separate discussions.Motivation
Index segments can be added, replaced, and removed over time. Today, clients cannot reliably determine segment chronology from the latest manifest alone. This makes deterministic cache placement difficult.
A concrete example:
7a42d66e-6b2c-4dd7-9f6a-2c9c2b29e147vector_idxnode-a,node-b,node-c,node-dvector_idx:s1 = 3af7a8fb-6c5a-4e8d-8c9d-54274d742f20,segment_seq = 1s2 = b7a6f02b-9c1b-47fd-a3d3-943ff9b0cf61,segment_seq = 2s3 = 0d3222d0-4fb6-4b97-9f58-b1d4ac1e1731,segment_seq = 3We want to assign segment caches deterministically to nodes. There are usually not many index segments, so consistent hashing does not work well here: with only three segments and four nodes, it is common for several segments to land on the same node while other nodes get none.
Instead, we want anchored round robin: hash stable dataset/index information, such as
7a42d66e-6b2c-4dd7-9f6a-2c9c2b29e147:vector_idx, choose an anchor node, and then assign bysegment_seq. If the anchor isnode-b, the assignment is:segment_seq = 1->node-bsegment_seq = 2->node-csegment_seq = 3->node-dsegment_seq = 4->node-aIf
s2has many deletes, we may rebuilds2plus newly unindexed fragments intos4 = c1e7ed27-739d-4799-9d4c-37dce0c9ec2a. The latest segment list should shows1,s3, ands4, withs4.segment_seq = 4ands2removed. Thens3remains assigned tonode-d, and only the news4cache goes tonode-a.Without
segment_seq, a client that round-robins over the latest segment list[s1, s3, s4]could moves3intos2's old position onnode-c, causing unnecessary cache invalidation.Format Change
Add to
lance.table.IndexMetadata:Missing means the segment was written before this feature or is uncommitted metadata that has not reached the commit layer yet.
Add logical index metadata to
lance.table.IndexSection:LogicalIndexMetadata.max_segment_seqis the highestsegment_seqthat has ever been assigned for that index name. It is preserved even when no physical segment for that index name is currently present.This is needed so sequence numbers are not reused after deleting the highest-numbered segment. For example, if
vector_idxhas segments with sequence numbers 1, 2, and 3, and segment 3 is removed,logical_indexeskeepsmax_segment_seq = 3; the nextvector_idxsegment receives 4, not 3.Backwards and Forward Compatibility
Readers are not required to understand the new fields. Older readers may ignore
IndexMetadata.segment_seqandLogicalIndexMetadata.max_segment_seqbecause these fields are only needed for deterministic segment chronology and future writes, not for reading indexed data.Writers must preserve and continue
segment_seqassignment, so this change adds the writer-only feature flagFLAG_INDEX_SEGMENT_SEQand does not add a reader feature flag. This single writer flag covers both physicalIndexMetadata.segment_seqand logicalLogicalIndexMetadata.max_segment_seq.Backward compatibility for existing datasets:
segment_seq, and existing index sections may not havelogical_indexes.segment_seqvalues for existing physical segments, scoped by index name, using deterministic current manifest order.LogicalIndexMetadata.max_segment_seqfrom the assigned physical segment sequence numbers and from any existing logical high-water marks.segment_seqbutlogical_indexesis missing, the writer still reconstructsmax_segment_seqfrom the maximum physicalsegment_seqfor each index name.Forward compatibility after the feature is enabled:
segment_seqor logicalmax_segment_seq, the writer setsFLAG_INDEX_SEGMENT_SEQinwriter_feature_flags.segment_seqorlogical_indexesand reverting the dataset to a state where sequence numbers can be reused.IndexSection, includinglogical_indexes, on commit. If a segment is removed, the logical high-water mark remains so the next segment for that index name receives a larger sequence number.IndexSection, including logical high-water marks, and recomputes name-scoped sequence numbers before the final manifest write.Implementation Summary
segment_seqvalues are assigned in the commit layer. New index build APIs do not require callers to provide sequence numbers. ConcurrentCreateIndexcommits are rebaseable, and the conflict resolver recomputes name-scopedsegment_seqvalues against the latest manifest before committing.The commit path reads and writes the full
IndexSection, includinglogical_indexes, so high-water marks survive segment removals and normal manifest rewrites.Vote
Please vote:
+1: approve0: abstain-1: reject, with justificationBeta Was this translation helpful? Give feedback.
All reactions