Follow-up to #1417 (Phases 1 & 2 shipped in #3986 and #3987 — local & remote auto-decompression of zip/gz/zlib/zst/snappy for the Config reader, luau/validate/describegpt lookup tables, and get/dc: ingest).
These items were explicitly scoped out of #1417 and are tracked here for a future pass. None are regressions.
Remote / get sources
Decompression semantics (shared)
Lookup-table caching
🤖 Generated with Claude Code
Follow-up to #1417 (Phases 1 & 2 shipped in #3986 and #3987 — local & remote auto-decompression of zip/gz/zlib/zst/snappy for the
Configreader,luau/validate/describegptlookup tables, andget/dc:ingest).These items were explicitly scoped out of #1417 and are tracked here for a future pass. None are regressions.
Remote /
getsourcessftp://sources (behind aget_sftpsub-feature).dc:inputs — persist computed stats/frequency alongside cached resources.diskcache::ingest_localstillfs::reads the whole file then decompresses in memory (decompress_source). The remote path already streams (gz/zlib/zst viaIngestSink::Decode); local large compressed ingests could OOM similarly. ✅ Done in fix: stream local compressed ingests instead of buffering whole file (#3988) #3990 —ingest_localnow streams.gz/.zlib/.zstintoBlobSinkvia the sameIngestSinkabstraction the remote paths use (bounded memory); zip/sz still full-buffer perIngestSink's per-format policy.Decompression semantics (shared)
util::process_input(command-level: extracts ALL entries) andConfigspecial-format (reader-level: first tabular entry) disagree on multi-entry zip semantics. Decide a single multi-entry policy and converge. ✅ Done in refactor: unify zip-input handling into one shared module (#3988) #3995 (option D) — both paths now share one zip module with a single selection rule; entries are returned tabular-first, so a single-input command and aConfig-only command pick the same first entry from a mixed multi-entry zip (they could previously read different entries). Multi-input commands (cat/sqlp/to/validate/scoresql) still receive every entry, nested special formats are preserved, and a zip with no supported entry now errors clearly.QSV_SKIP_FORMAT_CHECKis honored for zip members.slice_from_avro/slice_from_jsonl_*"pass" only because the asserted substring is embedded in the binary. Decide whether to surface these errors (and fix/replace the fragile fixtures) or keep the per-format swallow. ✅ Done in fix: surface special-format conversion errors instead of swallowing them (#3988) #3989 — conversion failures are now surfaced as hard errors for all special formats (escape hatch:QSV_SKIP_FORMAT_CHECK); regenerated the unreadable Avro fixture and reworked the twosliceDecimal-pschema tests (which only passed via the swallow) onto the compressed-CSV path that genuinely applies a Decimal pschema..parquetinside a.zip) are unsupported by design — document or support. ✅ Decision: won't support — documented. Parquet/Avro/Arrow are already compressed, so nesting them in a.zipis not a real-world workflow; provide such files directly (qsv reads them natively). Documented in the README "Extended Input Support" section and inselect_zip_entry's doc comment.Lookup-table caching
.zipwhose inner tabular file is non-CSV-delimited can't have its delimiter inferred from the URL (the cache file defaults to.csv).gz/zlib/zst/szalready carry the inner extension via the URL stem. ✅ Done in fix: infer remote .zip lookup table's inner delimiter from its entry (#3988) #3991 — the downloader now names the cache file from the inner entry's extension discovered during extraction (resetting back to.csvon a later csv-inner refresh), and the cache-hit path probes the tabular extensions to find it. Generalized tockan://too (its resolved data URL's extension isn't knowable up front);dathere://and explicit-extension URLs stay deterministic.🤖 Generated with Claude Code