feat(python): expose zonemap segment builds#7177
Conversation
9b63bd7 to
71532a2
Compare
d7af83b to
bc4063e
Compare
| ) | ||
|
|
||
|
|
||
| def test_zonemap_uncommitted_segments_can_be_merged_and_committed_from_python(tmp_path): |
There was a problem hiding this comment.
Minor: Simplify the names of the test cases.
There was a problem hiding this comment.
simplified to test_zonemap_segment_merge_and_commit_from_python
| ) | ||
|
|
||
| if fragment_ids is not None and logical_index_type in {"BTREE", "BITMAP"}: | ||
| if fragment_ids is not None and logical_index_type in { |
There was a problem hiding this comment.
Minor: {"BTREE", "BITMAP", "ZONEMAP"} appears multiple times. Can we abstract it into a method?
There was a problem hiding this comment.
extract these code into a new function: _is_segment_native_scalar_index_type
3ef12ba to
1066074
Compare
|
@claude review |
|
|
||
| def test_zonemap_segment_merge_and_commit_from_python(tmp_path): | ||
| ds = generate_multi_fragment_dataset( | ||
| tmp_path, num_fragments=4, rows_per_fragment=40 |
There was a problem hiding this comment.
Can we add the number about rows_per_fragment to be larger than 8192(e.g. > 2 * 8192), so that we can have two zones in one fragment? Because the default value of rows_per_zone is 8192.
There was a problem hiding this comment.
Adjusted the merge/commit test to use rows_per_fragment = 20_000, so each fragment contains multiple ZoneMap zones. The query now targets rows in the second zone.
| with pytest.raises(ValueError, match="create_index_uncommitted"): | ||
| ds.create_scalar_index( | ||
| column="id", | ||
| index_type="ZONEMAP", | ||
| fragment_ids=[fragment_ids[0]], | ||
| ) |
There was a problem hiding this comment.
IMO, it would be better to add a test case named .e.g test_zonemap_fragment_ids_parameter_validation?
There was a problem hiding this comment.
Split the create_scalar_index(..., fragment_ids=...) validation into test_zonemap_fragment_ids_parameter_validation, leaving the merge/commit test focused on the segment lifecycle.
e2255e9 to
d928901
Compare
d928901 to
01d7152
Compare
yanghua
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your contribution!
Summary
create_index_uncommitted(..., index_type="ZONEMAP", fragment_ids=...)to use the scalar segment build pathcreate_scalar_index(..., index_type="ZONEMAP", fragment_ids=...)with the same migration guidance used for BTREE/BITMAP segment-native scalar buildsContext
ZoneMap segment merge support landed in #7128, but the Python public
create_index_uncommittedhelper still only routed BTREE/BITMAP/INVERTED scalar requests through the uncommitted scalar segment path. As a result, ZONEMAP fell through to vector validation and failed on scalar columns.I searched for open duplicate PRs with
ZONEMAP create_index_uncommitted Python distributedandzonemap uncommitted python; no open matches were found.Validation
UV_PYTHON=/usr/bin/python3.11 uv run pytest python/tests/test_scalar_index.py::test_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_segment_fts python/tests/test_scalar_index.py::test_zonemap_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_zonemap_segment_merge_and_commit_from_python python/tests/test_scalar_index.py::test_bitmap_uncommitted_segments_can_be_committed_from_python python/tests/test_scalar_index.py::test_btree_fragment_ids_parameter_validationpassedUV_PYTHON=/usr/bin/python3.11 uv run ruff format --check --diff python/lance/dataset.py python/tests/test_scalar_index.pypassedUV_PYTHON=/usr/bin/python3.11 uv run ruff check python/lance/dataset.py python/tests/test_scalar_index.pypassedUV_PYTHON=/usr/bin/python3.11 uv run make lintpassed