Skip to content

Operation::Update storing matched row offsets as flat repeated uint32 makes dense full-table column rewrites prohibitively large #7080

@pengw0048

Description

@pengw0048

Context

PR #6650 added Operation::Update.updated_fragment_offsets so partial column-rewrite commits can refresh _row_last_updated_at_version only for matched rows. Followed by PR #6748 exposing this on the Java API.

In Rust the offsets are stored as HashMap<u64, RoaringBitmap>, which is space-efficient. The wire/proto encoding however is currently:

// transaction.proto
message Update {
  // ...
  map<uint64, UInt32List> updated_fragment_offsets = 9;
}

message UInt32List {
  // sorted distinct local physical row offsets within the fragment (0-based)
  repeated uint32 values = 1;
}

i.e. the in-memory bitmap is materialized as a flat Vec<u32> (and on the Java side a long[]) for transport.

Problem

For dense full-fragment column rewrites — the common case for column-level migrations or any whole-table update_columns_with_offsets invocation — the wire-size is unbounded with respect to row count:

Concrete example from a dataset (1.6B rows, ~18,880 fragments, avg ~86k rows/fragment, all rows in every fragment match):

encoding bytes / fragment bytes total
repeated uint32 packed varint (status quo) ~242 KB ~4.5 GB
serialized RoaringBitmap, portable format ~30-300 B (single dense run container) ~5 MB

A ~4.5 GB transaction manifest is impractical in several places:

  • Lance commit lock has a finite hold time; writing/reading multi-GB manifests pushes us against it.
  • pylance / lance-jni readers materialize the full Vec<u32> / long[] in memory before constructing the HashMap<u64, RoaringBitmap> — peak memory is O(total_matched_rows × 8 bytes), not O(serialized_bitmap_bytes).
  • Sparse cases (e.g. an operation that touched 10M rows out of 1.6B) work fine, so the bug is invisible until someone hits a dense-rewrite use case.

Proposed change

Drop the flat UInt32List encoding and transmit only the serialized RoaringBitmap on the wire. The existing Rust in-memory representation (HashMap<u64, RoaringBitmap>) already maps 1:1 to this; current code paths spend the round-trip CPU/memory expanding it to/from Vec<u32> for no functional benefit.

message Update {
  // ...
  // Deprecated: writers stop populating; readers fall through to field 10.
  map<uint64, UInt32List> updated_fragment_offsets = 9 [deprecated = true];

  // Preferred. Value is the portable RoaringBitmap serialisation
  // (https://github.com/RoaringBitmap/CRoaring#format-specification),
  // i.e. `RoaringBitmap::serialize_into` / `deserialize_from`.
  map<uint64, bytes> updated_fragment_offsets_roaring = 10;
}

Rust write path:

let bytes: Vec<u8> = {
    let mut buf = Vec::with_capacity(bitmap.serialized_size_in_bytes());
    bitmap.serialize_into(&mut buf)?;
    buf
};
proto.updated_fragment_offsets_roaring.insert(frag_id, bytes);

Read path:

RoaringBitmap::deserialize_from(&bytes[..])?

RoaringBitmap is already a dependency of the crate after PR #6650 — no new third-party additions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions