[core] Unify single-column global index writer by JingsongLi · Pull Request #8275 · apache/paimon

JingsongLi · 2026-06-18T02:58:20Z

Summary

This PR merges the previous singleton and parallel single-column global index writer APIs into one GlobalIndexSingleColumnWriter interface. Single-column index writers now receive the caller-provided shard-relative row id through write(@Nullable Object key, long relativeRowId).

Changes

Replace GlobalIndexSingletonWriter and GlobalIndexParallelWriter with GlobalIndexSingleColumnWriter.
Update BTree, Vector, Lumina, and Tantivy global index writers to implement the unified single-column writer API.
Pass explicit shard-relative row ids from BTree, Flink, and Spark index build paths.
Keep vector/full-text row counts as logical row counts while persisting caller-provided row ids for non-null indexed entries.
Update test helper index formats and affected tests to read/write explicit relative row ids.

Testing

mvn -pl paimon-vector/paimon-vector-index -am -Pfast-build -DskipTests test-compile
mvn -pl paimon-lumina -am -Pfast-build -DskipTests test-compile
mvn -pl paimon-tantivy/paimon-tantivy-index -am -Pfast-build -DskipTests test-compile
mvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DskipTests test-compile
mvn -pl paimon-spark/paimon-spark-common -am -Pfast-build -DskipTests compile
git diff --check

leaves12138

Thanks for the update. I re-reviewed the latest revision and the single-column writer API now looks consistent across BTree, vector, Lumina, Tantivy, Flink and Spark build paths.

The important semantic change is also correct: vector/full-text writers now persist the caller-provided relative row id instead of relying on an implicit dense sequence, while rowCount still represents logical rows processed for index metadata. This matches sparse/sharded row ranges and the added multiple-index-file tests cover the regression case.

I also ran the focused checks locally:

mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest='VectorSearchBuilderTest#testVectorSearchWithMultipleIndexFiles,FullTextSearchBuilderTest#testFullTextSearchWithMultipleIndexFiles' test
mvn -pl paimon-lumina -am -Pfast-build -DfailIfNoTests=false -Dtest='LuminaVectorGlobalIndexWriterTest' test

Both passed. +1 from my side.

JingsongLi added 3 commits June 18, 2026 10:56

[core] Unify single-column global index writer

c7d62ed

checkstyle

775d63a

checkstyle

fd0e495

JingsongLi force-pushed the codex/global-index-single-column-writer branch from 3d2839a to fd0e495 Compare June 18, 2026 03:24

leaves12138 approved these changes Jun 18, 2026

View reviewed changes

JingsongLi merged commit 3bee09f into apache:master Jun 18, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Unify single-column global index writer#8275

[core] Unify single-column global index writer#8275
JingsongLi merged 3 commits into
apache:masterfrom
JingsongLi:codex/global-index-single-column-writer

JingsongLi commented Jun 18, 2026

Uh oh!

leaves12138 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JingsongLi commented Jun 18, 2026

Summary

Changes

Testing

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants