Problem
Dataset.add_columns(..., read_columns=["blob"]) fails when blob is an existing Blob v2 column.
This affects UDF-based schema evolution where the new column is derived from an existing Blob v2 descriptor column.
Reproduction
import pyarrow as pa
import lance
values = [
b"inline",
b"p" * (64 * 1024 + 1024),
b"d" * (4 * 1024 * 1024 + 1024),
external_blob.as_uri(),
]
ds = lance.write_dataset(
pa.table({"id": range(4), "blob": lance.blob_array(values)}),
uri,
data_storage_version="2.2",
initial_bases=[
lance.DatasetBasePath(external_base.as_uri(), name="external", id=1)
],
)
@lance.batch_udf(output_schema=pa.schema([pa.field("blob_kind", pa.int32())]))
def blob_kind(batch):
return pa.record_batch([batch["blob"].field("kind")], ["blob_kind"])
ds.add_columns(blob_kind, read_columns=["blob"])
Error
OSError: Invalid user input: there were more fields in the schema than provided column indices / infos,
rust/lance-encoding/src/decoder.rs:454:13
Expected behavior
add_columns should be able to read an existing Blob v2 column as a descriptor struct when it is listed in read_columns.
The UDF should receive the Blob v2 descriptor batch, and derived columns should be written successfully.
Notes
This is separate from writing new Blob v2 columns through add_columns.
The existing Blob v2 add_columns tests cover writing new Blob v2 values through RecordBatchReader and BatchUDF, including inline, packed, dedicated, and external. This issue is about reading an existing Blob v2 column during the UDF input scan.
Problem
Dataset.add_columns(..., read_columns=["blob"])fails whenblobis an existing Blob v2 column.This affects UDF-based schema evolution where the new column is derived from an existing Blob v2 descriptor column.
Reproduction
Error
Expected behavior
add_columnsshould be able to read an existing Blob v2 column as a descriptor struct when it is listed inread_columns.The UDF should receive the Blob v2 descriptor batch, and derived columns should be written successfully.
Notes
This is separate from writing new Blob v2 columns through
add_columns.The existing Blob v2 add_columns tests cover writing new Blob v2 values through
RecordBatchReaderandBatchUDF, includinginline,packed,dedicated, andexternal. This issue is about reading an existing Blob v2 column during the UDF input scan.