Skip to content

Add AnyUtf8: one logical type for any UTF-8 encoding#10

Merged
emilk merged 2 commits into
mainfrom
emilk/any-utf8
Jun 9, 2026
Merged

Add AnyUtf8: one logical type for any UTF-8 encoding#10
emilk merged 2 commits into
mainfrom
emilk/any-utf8

Conversation

@emilk

@emilk emilk commented Jun 9, 2026

Copy link
Copy Markdown
Member

The string sibling of AnyBinary / AnyList: a quiver-only logical type that accepts any of arrow's UTF-8 encodings and reads them uniformly as &str.

Logical type Accepts Element value
AnyUtf8 Utf8, LargeUtf8, Utf8View &str
let column = Column::<AnyUtf8>::try_from(array)?; // Utf8 / LargeUtf8 / Utf8View
assert_eq!(column.value(0), "alice");

Details

  • Enum AnyTypedUtf8 over the three arrow string arrays; downcast dispatches on array.data_type() and reads every element as &str.
  • Parse-only: implements LogicalType and RefType (so try_from/reading/column[i] work) but not ConcreteType — no single arrow datatype, so no from_values/Default/schema. Build a concrete encoding such as Column<Utf8>.
  • (No fixed-size variant exists for UTF-8 in arrow, so — unlike AnyBinary — there's nothing extra to include.)
  • Reads stay O(1)/zero-copy; nullability via Option<AnyUtf8> like any other column.

Verification

  • cargo clippy --all-features --all-targets clean, cargo fmt --all applied
  • cargo test --all-features green, incl. a new any_utf8_columns test: all three encodings read uniformly, RefType indexing, non-string rejected, null handling (Option<AnyUtf8> + non-nullable rejection)
  • cargo doc --document-private-items -D warnings clean
  • README: added to the supported-types table and the AnyList section

🤖 Generated with Claude Code

emilk and others added 2 commits June 9, 2026 13:47
The string sibling of `AnyBinary`/`AnyList`: `Column<AnyUtf8>` accepts any of
arrow's UTF-8 encodings — `Utf8`, `LargeUtf8`, or `Utf8View` — and reads them
all uniformly as `&str`. Parse-only: implements `LogicalType` (and `RefType`)
but not `ConcreteType`, so no `from_values`/`Default`/schema; build a concrete
encoding such as `Column<Utf8>`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@emilk emilk merged commit 946c95f into main Jun 9, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant