Skip to content

feat(python): expose zonemap segment builds#7177

Open
everySympathy wants to merge 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python
Open

feat(python): expose zonemap segment builds#7177
everySympathy wants to merge 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python

Conversation

@everySympathy

@everySympathy everySympathy commented Jun 9, 2026

Copy link
Copy Markdown

Summary

  • allow Python create_index_uncommitted(..., index_type="ZONEMAP", fragment_ids=...) to use the scalar segment build path
  • reject create_scalar_index(..., index_type="ZONEMAP", fragment_ids=...) with the same migration guidance used for BTREE/BITMAP segment-native scalar builds
  • add Python regression coverage for ZoneMap fragment-id validation and for staging, merging, committing, and querying ZoneMap segments

Context

ZoneMap segment merge support landed in #7128, but the Python public create_index_uncommitted helper still only routed BTREE/BITMAP/INVERTED scalar requests through the uncommitted scalar segment path. As a result, ZONEMAP fell through to vector validation and failed on scalar columns.

I searched for open duplicate PRs with ZONEMAP create_index_uncommitted Python distributed and zonemap uncommitted python; no open matches were found.

Validation

  • UV_PYTHON=/usr/bin/python3.11 uv run pytest python/tests/test_scalar_index.py::test_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_segment_fts python/tests/test_scalar_index.py::test_zonemap_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_zonemap_segment_merge_and_commit_from_python python/tests/test_scalar_index.py::test_bitmap_uncommitted_segments_can_be_committed_from_python python/tests/test_scalar_index.py::test_btree_fragment_ids_parameter_validation passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff format --check --diff python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff check python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run make lint passed

@github-actions github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 9, 2026
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 11:08
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from 9b63bd7 to 71532a2 Compare June 9, 2026 12:38
@everySympathy everySympathy marked this pull request as draft June 9, 2026 12:38
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 3 times, most recently from d7af83b to bc4063e Compare June 9, 2026 12:49
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:15
@everySympathy everySympathy marked this pull request as draft June 9, 2026 13:20
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:20
)


def test_zonemap_uncommitted_segments_can_be_merged_and_committed_from_python(tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Simplify the names of the test cases.

@everySympathy everySympathy Jun 9, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified to test_zonemap_segment_merge_and_commit_from_python

Comment thread python/python/lance/dataset.py Outdated
)

if fragment_ids is not None and logical_index_type in {"BTREE", "BITMAP"}:
if fragment_ids is not None and logical_index_type in {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: {"BTREE", "BITMAP", "ZONEMAP"} appears multiple times. Can we abstract it into a method?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract these code into a new function: _is_segment_native_scalar_index_type

@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 3 times, most recently from 3ef12ba to 1066074 Compare June 9, 2026 14:16
@yanghua

yanghua commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

@claude review

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments.


def test_zonemap_segment_merge_and_commit_from_python(tmp_path):
ds = generate_multi_fragment_dataset(
tmp_path, num_fragments=4, rows_per_fragment=40

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the number about rows_per_fragment to be larger than 8192(e.g. > 2 * 8192), so that we can have two zones in one fragment? Because the default value of rows_per_zone is 8192.

@everySympathy everySympathy Jun 10, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the merge/commit test to use rows_per_fragment = 20_000, so each fragment contains multiple ZoneMap zones. The query now targets rows in the second zone.

Comment on lines +4018 to +4023
with pytest.raises(ValueError, match="create_index_uncommitted"):
ds.create_scalar_index(
column="id",
index_type="ZONEMAP",
fragment_ids=[fragment_ids[0]],
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better to add a test case named .e.g test_zonemap_fragment_ids_parameter_validation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the create_scalar_index(..., fragment_ids=...) validation into test_zonemap_fragment_ids_parameter_validation, leaving the merge/commit test focused on the segment lifecycle.

@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 2 times, most recently from e2255e9 to d928901 Compare June 10, 2026 12:04
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from d928901 to 01d7152 Compare June 10, 2026 12:13
@everySympathy everySympathy requested a review from yanghua June 10, 2026 12:15

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants