Status
CompressedSet from upstream segmentio/ksuid (set.go, 343 LOC, 17 functions) is deliberately not ported in libksuid v0.1. This is a scope decision, not a technical limitation -- the port is straightforward but the trade-off did not justify the surface in the first release.
What CompressedSet is
Upstream Go feature for packing many KSUIDs into a small byte slice. Sorted KSUIDs share leading bytes (timestamp + high payload), and consecutive payload values often differ by 1; the format exploits that with a 2-bit tag byte plus a varint delta:
| Tag |
Meaning |
Approximate cost |
rawKSUID (0b00) |
first / restart entry |
20 bytes |
timeDelta (0b01) |
new timestamp + raw payload |
1 + N + 16 bytes |
payloadDelta (0b10) |
same timestamp, payload += delta128 |
1 + N bytes (typically 2-5) |
payloadRange (0b11) |
same timestamp, M consecutive +1s |
1 + N bytes for M ids |
Source: /home/joykim/git/semantic-reasoning/ksuid/set.go:67-244. The varint helpers (varintLength32/64/128, appendVarint32/64/128, varint32/64/128) account for ~140 of the 343 lines.
Realistic compression on time-clustered streams is roughly 3-5 bytes / KSUID after the first, vs. 20 bytes raw -- a 4-6x win that matters when you are persisting millions of IDs to disk or to a network wire.
Why v0.1 left it out
[Phase 1 Critic review] explicitly recommended cutting it:
"set.go packs varint deltas, time deltas, and range/single discriminators. Any off-by-one in varint reading or wrong byte-tag enum (rawKSUID, timeDelta, payloadDelta, ...) silently produces garbage KSUIDs on iteration. Half-baked port = wire incompatibility nobody notices for months."
The decision rests on three things, in order:
- Silent-corruption risk. A wrong tag mask, a one-byte varint length error, or an off-by-one in the range-length scan all produce decoded KSUIDs that look fine until they collide or sort wrong. Detecting that in CI requires a corpus of upstream-Go-encoded blobs and a differential test; we do not have that infrastructure yet.
- Niche utility. The 80% of users who call
ksuid_new / ksuid_parse / ksuid_format do not care. CompressedSet only earns its keep at scale (>=10^4 IDs in one place, e.g. a database snapshot or queue dump).
- Footprint. ~250 LOC across encode + decode + iterator + varint helpers, plus the tag/varint differential tests. That is a sizeable fraction of libksuid's current ~18 KB stripped binary.
Reopening criteria
Land CompressedSet when any of the following lands first:
- a concrete downstream consumer requests it on this issue tracker;
- libksuid grows a benchmark / fixtures corpus that captures Go-generated
Compress outputs and we can pin C decode to byte-for-byte parity (the corpus is the missing piece, not the algorithm);
- libksuid is already proposing wire compatibility with another ksuid implementation that uses the same packed format.
Until then the answer is "use upstream Go ksuid for compressed-set workloads, or pack/unpack at the application layer."
Implementation sketch (when we do port)
- New TU
libksuid/set.c + private header libksuid/set.h (or public if exposed).
- Public surface: opaque
ksuid_set_t builder + iterator, plus ksuid_set_compress(const ksuid_t *ids, size_t n, uint8_t **out, size_t *out_len) and matching iterator (ksuid_set_iter_init, ksuid_set_iter_next).
- Reuse
libksuid/uint128.h (currently removed -- restore alongside) for the 128-bit payload-delta arithmetic.
- Tests:
- varint round-trip for every byte length 1..16;
- tag enum exhaustive: every (current_state, next_state) pair drives the right encode tag;
- Differential corpus -- Go program emits 10^4 KSUIDs with varied (timestamp, payload-delta) clusters, libksuid decodes byte-for-byte;
- 65 KSUID
payloadRange boundary (just-fits vs just-overflows the varint length).
- Meson option
-Dcompressed_set=true (default false) until the differential corpus is in CI; flip default after one stable release.
Affected files / references
- Upstream:
set.go:1-343 (entire file)
- Architect plan (Phase 1, persona analysis): proposed
libksuid/set.c as a separate TU
- Critic plan (Phase 1): "Cut entirely from v1. Header
ksuid_set.h stub returning KSUID_ERR_UNSUPPORTED."
- Related:
KSUID_ERR_UNSUPPORTED in libksuid/ksuid.h is not yet defined -- if a stub is desired before a full port, it should land in a small follow-up commit.
Labels (if/when configured)
enhancement, wire-format, feature-parity, not-blocking-v1
Status
CompressedSetfrom upstream segmentio/ksuid (set.go, 343 LOC, 17 functions) is deliberately not ported in libksuid v0.1. This is a scope decision, not a technical limitation -- the port is straightforward but the trade-off did not justify the surface in the first release.What CompressedSet is
Upstream Go feature for packing many KSUIDs into a small byte slice. Sorted KSUIDs share leading bytes (timestamp + high payload), and consecutive payload values often differ by 1; the format exploits that with a 2-bit tag byte plus a varint delta:
rawKSUID(0b00)timeDelta(0b01)payloadDelta(0b10)payloadRange(0b11)Source:
/home/joykim/git/semantic-reasoning/ksuid/set.go:67-244. The varint helpers (varintLength32/64/128,appendVarint32/64/128,varint32/64/128) account for ~140 of the 343 lines.Realistic compression on time-clustered streams is roughly 3-5 bytes / KSUID after the first, vs. 20 bytes raw -- a 4-6x win that matters when you are persisting millions of IDs to disk or to a network wire.
Why v0.1 left it out
[Phase 1 Critic review] explicitly recommended cutting it:
The decision rests on three things, in order:
ksuid_new/ksuid_parse/ksuid_formatdo not care. CompressedSet only earns its keep at scale (>=10^4 IDs in one place, e.g. a database snapshot or queue dump).Reopening criteria
Land CompressedSet when any of the following lands first:
Compressoutputs and we can pin C decode to byte-for-byte parity (the corpus is the missing piece, not the algorithm);Until then the answer is "use upstream Go ksuid for compressed-set workloads, or pack/unpack at the application layer."
Implementation sketch (when we do port)
libksuid/set.c+ private headerlibksuid/set.h(or public if exposed).ksuid_set_tbuilder + iterator, plusksuid_set_compress(const ksuid_t *ids, size_t n, uint8_t **out, size_t *out_len)and matching iterator (ksuid_set_iter_init,ksuid_set_iter_next).libksuid/uint128.h(currently removed -- restore alongside) for the 128-bit payload-delta arithmetic.payloadRangeboundary (just-fits vs just-overflows the varint length).-Dcompressed_set=true(default false) until the differential corpus is in CI; flip default after one stable release.Affected files / references
set.go:1-343(entire file)libksuid/set.cas a separate TUksuid_set.hstub returningKSUID_ERR_UNSUPPORTED."KSUID_ERR_UNSUPPORTEDinlibksuid/ksuid.his not yet defined -- if a stub is desired before a full port, it should land in a small follow-up commit.Labels (if/when configured)
enhancement,wire-format,feature-parity,not-blocking-v1