add_columns should support reading existing Blob v2 columns in UDFs

## Problem

`Dataset.add_columns(..., read_columns=["blob"])` fails when `blob` is an existing Blob v2 column.

This affects UDF-based schema evolution where the new column is derived from an existing Blob v2 descriptor column.

## Reproduction

```python
import pyarrow as pa
import lance

values = [
    b"inline",
    b"p" * (64 * 1024 + 1024),
    b"d" * (4 * 1024 * 1024 + 1024),
    external_blob.as_uri(),
]

ds = lance.write_dataset(
    pa.table({"id": range(4), "blob": lance.blob_array(values)}),
    uri,
    data_storage_version="2.2",
    initial_bases=[
        lance.DatasetBasePath(external_base.as_uri(), name="external", id=1)
    ],
)

@lance.batch_udf(output_schema=pa.schema([pa.field("blob_kind", pa.int32())]))
def blob_kind(batch):
    return pa.record_batch([batch["blob"].field("kind")], ["blob_kind"])

ds.add_columns(blob_kind, read_columns=["blob"])
```

## Error

```text
OSError: Invalid user input: there were more fields in the schema than provided column indices / infos,
rust/lance-encoding/src/decoder.rs:454:13
```

## Expected behavior

`add_columns` should be able to read an existing Blob v2 column as a descriptor struct when it is listed in `read_columns`.

The UDF should receive the Blob v2 descriptor batch, and derived columns should be written successfully.

## Notes

This is separate from writing new Blob v2 columns through `add_columns`.

The existing Blob v2 add_columns tests cover writing new Blob v2 values through `RecordBatchReader` and `BatchUDF`, including `inline`, `packed`, `dedicated`, and `external`. This issue is about reading an existing Blob v2 column during the UDF input scan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_columns should support reading existing Blob v2 columns in UDFs #7168

Problem

Reproduction

Error

Expected behavior

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

add_columns should support reading existing Blob v2 columns in UDFs #7168

Description

Problem

Reproduction

Error

Expected behavior

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions