Skip to content

repr: Harden proto/Row decoding against malformed input#36984

Open
def- wants to merge 12 commits into
MaterializeInc:mainfrom
def-:fuzz-repr
Open

repr: Harden proto/Row decoding against malformed input#36984
def- wants to merge 12 commits into
MaterializeInc:mainfrom
def-:fuzz-repr

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hardens mz-repr proto/Row decoding against malformed/untrusted bytes — these paths are reachable from persisted state and the wire, so a panic is an availability bug. Replaces panics/asserts with proper decode errors and validates ranges (Date, CheckedTimestamp, ProtoNumeric, ProtoRange, ProtoRelationDesc, ProtoRow dict ordering, uuid parsing, leap-second truncation, Avro decimal).

Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.

🤖 Generated with Claude Code

def- and others added 12 commits June 11, 2026 10:57
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path
unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that
`push_range_with` rejects (e.g. lower > upper from an attacker-crafted
proto) panicked the process. Propagate the error to the caller via
`?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.

Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.

Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata`
via `zip_eq`, which panics on length mismatch — reachable from
untrusted proto bytes. Check the lengths explicitly and return
`InvalidFieldError`.
`RowPacker::push_range_with` documents a panic if a finite range
bound expression pushes `Datum::Null`. The proto-decoding path
called it with a closure that just forwarded the proto datum,
which would happily push a `Null` from a malicious or corrupt
`ProtoRangeInner.lower/upper` and crash the process. Detect the
`Null` ProtoDatum up front and return a decode error instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `uuid` crate panics while constructing a parse error for some short,
brace-wrapped inputs (e.g. `{}`, `{\0}`) — it mis-slices the input
("byte range starts at 1 but ends at 0"). `strconv::parse_uuid` is reached
from `Value::decode_text` for a client-supplied UUID bind parameter, so
this is a client-triggerable crash. Reject anything that can't be a UUID
(shorter than 32 chars, or non-ASCII) before handing it to the crate, and
route the two pgrepr UUID text-decode call sites through `parse_uuid`
(they previously called `Uuid::parse_str` directly, bypassing the guard).

Found by the value_decode_text cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`RowPacker::push_range_with` asserted (`assert!`, so a panic even in
release) that a range's bounds are non-null, of consistent datum kind, and
the expected count. These bounds are reconstructed from an untrusted
`ProtoRow`, so a crafted/corrupted range (e.g. an Array lower and a Map
upper) panicked the decoder. Return a new `InvalidRangeError::InvalidRangeData`
(with a matching proto variant) instead; valid callers never hit it.

Found by the row_proto_roundtrip cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DatumDictIter` debug-asserts that map keys are unique and strictly ascending,
but the `ProtoRow` -> `Row` decode packed a dict's elements verbatim without
checking. A crafted proto with duplicate or out-of-order keys therefore decoded
to a malformed map that tripped the assert (a panic in debug builds) the next
time it was iterated. Validate the key order while decoding and reject a
violation as a decode error. Regression for the row_proto_roundtrip cargo-fuzz
finding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CheckedTimestamp::round_to_precision` — applied by the string->timestamp and
string->timestamptz casts before storage — truncates to microsecond (or, for
`date_trunc`-style precision, millisecond) resolution via
`truncate_microseconds`/`truncate_milliseconds`. Both rebuilt the time with
`NaiveTime::from_hms_{micro,milli}_opt`, whose leap-second sub-second range
(>= 1s) is only accepted when `sec == 59`.

A parsed `:60` literal is stored as chrono's leap representation, and time-zone
math can leave that 1s sub-second on a second other than `:59` (e.g.
`12:30:56` with a 1,000,000,000ns sub-second). The constructor then returned
`None` and the `.unwrap()` panicked, so e.g. `SELECT TIMESTAMPTZ '...:60...'`
aborted the process.

Truncate with `with_nanosecond` instead: it accepts the whole [0, 2s) range at
any second, so it preserves the value (leap included) and never fails here.
Found by the strconv timestamp round-trip fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twos_complement_be_to_numeric_inner read input[0] without first checking
that the slice is non-empty, panicking with "index out of bounds: the len
is 0 but the index is 0" on an empty byte string.

That input is reachable from the wire: an Avro `decimal` logical type whose
unscaled value is encoded as zero-length `bytes` flows straight from a Kafka
message body through AvroFlatDecoder into this function, so a single crafted
(or buggy-producer) message crashes Avro source decoding — an availability
bug for ingestion. Our own encoder never emits an empty byte string (zero is
the single byte 0x00), and Java's Avro decoder likewise rejects empty input
(BigInteger throws "Zero length BigInteger"), so the right behaviour is a
clean decode error, not a silent zero. The caller already maps the Err to a
DecodeError, so the malformed row now surfaces as a decode error instead of
aborting the worker.

Found by the new avro_decode_fuzzed_schema fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@def- def- requested a review from antiguru June 12, 2026 10:41
@def- def- marked this pull request as ready for review June 12, 2026 10:41
@def- def- requested review from a team as code owners June 12, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant