Releases: dathere/qsv
8.0.0
[8.0.0] - 2025-10-06
1
Findable, Accessible, Interoperable & Reusable (FAIR) Data is AI-Ready Data.
A week and a half after launching our "People's API" AI Chatbot and "AI-Ready" service, we fine-tune qsv further, as it powers the FAIRification engine that allows us to "open your data" (as a verb) - to infer and calculate AI-Ready, FAIR metadata at blazing speed even for large datasets.
This release features:
describegptfixes and improvementstablecan now produce "aligned" TSV and Fixed Width format filesvalidatenow has Extended Input Support in its RFC 4180 validation modeextdedupfixed to dedupe arbitrarily large csv or text filesluauupgraded from 0.690 to 0.693- PowerPC64 pre-built binaries - making it more convenient to use qsv on this "power"ful 😉 platform that's widely used in research (thanks to IBM-provided access to its native GitHub Action ppc64le runners! For the next release - qsv on IBM Z Mainframes!)
These changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to
make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.
Added
table: addleftendtabalignment option #3004table: addleftfwf(Fixed Width Format) alignment option 590c861validate: add Extended Input Support to RFC 4180 validation mode #3012- added PowerPC64 LE Linux prebuilt
Changed
describegpt: fine-tuned default LLM Prompt template (v3.1.0) 00e52a3 6b09b7e 5be7f2eluau: bump embedded Luau from 0.690 to 0.693 #3017schema: make Decimal Type Scale configurable for polars schema withQSV_POLARS_DECIMAL_SCALEenv var - f20edd5- updated optimized csv crate, adding non-allocating
StringRecord::trim()and moreinline()s 4a1c82a - deps: bump calamine to 0.31.0 bd7a04c
- deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag #2995
- deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) #3022
- deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac 4164875
- deps: replace tabwriter with renamed fork qsv-tabwriter #3010
- deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release 6afff4f
- build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in #3003
- build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in #3026
- build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #2974
- build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in #3000
- build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in #3027
- build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in #3005
- build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in #3008
- build(deps): bump indexmap from 2.11.0 to 2.11.1 by @dependabot[bot] in #2973
- build(deps): bump indexmap from 2.11.1 to 2.11.3 by @dependabot[bot] in #2993
- build(deps): bump indexmap from 2.11.3 to 2.11.4 by @dependabot[bot] in #2999
- build(deps): bump libc from 0.2.175 to 0.2.176 by @dependabot[bot] in #3009
- build(deps): bump mlua from 0.11.3 to 0.11.4 by @dependabot[bot] in #3021
- build(deps): bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #3011
- build(deps): bump redis from 0.32.5 to 0.32.6 by @dependabot[bot] in #3016
- build(deps): bump qsv-stats from 0.38.0 to 0.39.0 by @dependabot[bot] in #3028
- build(deps): bump qsv-stats from 0.39.0 to 0.39.1 by @dependabot[bot] in #3029
- build(deps): bump redis from 0.32.6 to 0.32.7 by @dependabot[bot] in #3025
- build(deps): bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #2983
- build(deps): bump serde from 1.0.223 to 1.0.224 by @dependabot[bot] in #2988
- build(deps): bump serde from 1.0.224 to 1.0.225 by @dependabot[bot] in #2994
- build(deps): bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #3002
- build(deps): bump serde from 1.0.226 to 1.0.227 by @dependabot[bot] in #3014
- build(deps): bump serde from 1.0.227 to 1.0.228 by @dependabot[bot] in #3019
- build(deps): bump serde_json from 1.0.143 to 1.0.145 by @dependabot[bot] in #2981
- build(deps): bump semver from 1.0.26 to 1.0.27 by @dependabot[bot] in #2982
- build(deps): bump sysinfo from 0.37.0 to 0.37.1 by @dependabot[bot] in #3015
- build(deps): bump sysinfo from 0.37.1 to 0.37.2 by @dependabot[bot] in #3024
- build(deps): bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #2975
- build(deps): bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #3007
- build(deps): bump toml from 0.9.6 to 0.9.7 by @dependabot[bot] in #3001
- pin zip to 4.6, as zip 5 has features that are not widely adopted b231a23
- applied select clippy lint suggestions
- updated indirect dependencies
- bumped MSRV to Rust 1.90
Fixed
describegpt: init cache vars even when --no-cache is used #2970describegpt:--base-urloption being ignored #2977schema: delimiter detection #2998extdedup: really use memmapped ondisk hash table #3020
Removed:
- removed powerpc64-le cross-compilation directive now that we have access to IBM-provided native PowerPC GH Action runner 9659bfc
- removed macOS on Intel (x86_64-apple-darwin) prebuilt binaries
Full Changelog: 7.1.0...8.0.0
-
SangyaPundir, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg ↩
7.1.0
[7.1.0] - 2025-09-06
🇮🇹 csv,conf,v9 edition 🍝
![]() |
Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN! For this feature release, we polished describegpt a bit more for the occasion...Towards the "People's API!"! Verso l'API del Popolo! (Answering People/Policymaker Interface) |
🚀 Enhanced describegpt Command
- Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
- Few-shot Learning: Add
--fewshot-examplesoption to improve LLM response quality with contextual examples - Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
- Conditional SQL Results: Implement conditional
--sql-resultsformat for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a.csvextension. If a "SQL hallucination" fails, the file is saved with a.sqlextension instead for the user to tweak and edit. - TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
- Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
- Disk Cache by Default: The disk cache is now enabled by default for better performance
- TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
(see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml) - Better Local LLM Support:
--api-keycan now be set to NONE for local LLM configurations that may not necessarily run onlocalhost(e.g. a shared Local LLM service running on the local network)
partition Command Enhancements
- New
--limitOption: Implement--limitoption to set the maximum number of open files - Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets
Added
describegpt: add configurable frequency limit #2950describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954describegpt: refactor default prompt file; add--fewshot-examplesoption #2955describegpt: add TogetherAI support for models endpoint #2965partition: add--limitoption #2960- added Windows ARM64 prebuilt binaries
Changed
describegpt: enable disk cache by default #2951describegpt: Polars SQL generation tweaks #2958python: replace deprecatedwith_gilwithattach#2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!- deps: bump embedded Luau from 0.688 to 0.690 #2967
- deps: bump Polars to 0.50.0 at py-1.33.0 tag
- build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
- build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
- build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
- build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
- build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
- build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
- build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
- applied select clippy lints
- updated indirect dependencies
Full Changelog: 7.0.1...7.1.0
7.0.1
[7.0.1] - 2025-08-28
A patch release with some minor bug fixes, benchmark tweaks and build system improvements.
Added
- publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)
Changed
- docs:
describegptexpanded error message about LLM URL or API key - deps: remove planus pinned dependency
Fixed
- fix:
geocode--batch 0causes panic when polars feature is enabled - publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
- publish: remove powerpc64le from main publish workflow
- benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks
Full Changelog: 7.0.0...7.0.1
7.0.0
[7.0.0] - 2025-08-28
🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀
This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!
With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.
🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis:
- Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud!
Even if you elect to use a cloud-based LLM, your Raw Data is never sent. - Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata,
describegptwill answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset." - Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query1 to respond to your prompt.
- Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and
--sql-results <ANSWER.CSV>is specified - an optimized SQL query will be automatically executed with the query results saved to the specified file.
As both DuckDB and Polars are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds2 - even for very large datasets. - Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio, with the ability to customize model behavior with the
--addl-propsoption. - Advanced Caching: Disk and Redis caching support for performance and cost optimization.
- Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.
Check out these examples using a 1 million row sample of NYC's 311 data!
--alloption produces a Data Dictionary, Description and Tags - Markdown, JSON- --prompt "What are the top 10 complaint types per community board and borough?" - SQL result
--prompt "How tall is the Empire State Building?"- "I'm sorry, I can only answer questions about the Dataset."
On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.
The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.
This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.
If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN. Hope to see you there!
Towards the People's API!
(Answering People/Policymaker Interface)
📊 Enhanced frequency Command:
- Rank Column: Ranking of frequency results for better data insights
- JSON Output Mode: New
--jsonoption not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field - Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality
⚡ stats Command Improvements:
- Faster Still: Enabled by improvements in the underlying qsv-stats crate
- Improved Precision: Faster, streamlined precision calculation
- SIMD Number Parsing: Hardware-accelerated parsing for int/float values
- Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date
- Enhanced Date Inference: Better date and boolean type inference capabilities
🔧 validate & schema Enhancements:
- Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the
--fancy-regexoption. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons) - JSON Schema Improvements: Better error handling and format validation options
- Schema Validation Refinements: More granular validation control with
--no-format-validation
🔄 rename Reverted and Improved:
When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes:
- Positional Mode: Renaming by position is now once again the default
- Pairwise Mode: New
--pairwiseflag for column renaming by column pairs
🗂️ partition Improvements:
- Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values
- Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record
Added
frequencyadd rank info to frequency table #2878frequencyadd--jsonoutput option #2868validateadd--fancy-regexoption #2845- add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper #2909
Changed
applyuse SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations #2863statsfaster precision calculation #2852- perf: Use simd_json instead of serde_json to serialize to JSON #2884
- refactor: create and use reqwest client helpers to eliminate redundant code #2888
- perf: Faster parallelized sha256 hash file #2918
- refactor:
describegpt#2890 - refactor:
describegptsetting--timeoutto 0 sets no timeout #2891 - refactor:
describegptmore refinements #2892 - feat:
describegptrefactor round3 #2893 - feat:
describegptdisk & redis caching #2895 - refactor:
describegpt#2896 - refactor:
describegptcreateget_cache_keyhelper; customizable stats options #2902 - feat:
describegptauto SQL RAG for--prompt#2904 - feat:
describegptmajor refactor #2913 - refactor:
describegptdefault promptfile is now embedded in qsv binary; fine-tune tests #2924 - feat:
describegptreturning reasoning with --json option #2926 - feat:
describegptadd DuckDB support in SQL RAG mode #2929 - feat:
describegptvarious DuckD...
6.0.1
[6.0.1] - 2025-07-12
This is a patch release with bug fixes and minor improvements.
Changed
- feat: updated completions for qsv v6.0.0 by @rzmk in #2838
- docs: updated sample schema.json based on NYC311 1M row sample benchmark data
- docs: updated sample stats output using NYC 311 1M row sample benchmark data
- build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in #2839
- build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in #2840
- bumped indirect dependencies
- Added benchmark_data.* to .gitignore
Fixed
geocode: make--batch=0mode more robust by setting a minimum batch size of 1,000 rows 2fa90bcjsonl: correct batchsize calculation to use input file instead of output file for line counting 742dc77benchmarks: fixed benchmarks with unescaped parameters with embedded spaces ad95596
Removed
- Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)
Full Changelog: 6.0.0...6.0.1
6.0.0
Highlights:
This is a major release with significant improvements and new features!
🔍 Enhanced lens command:
- File prompt support: You can now load prompts from files using the new
file:support, making it easier to reuse complex prompts - Wrap mode option: Added
--wrap-modeoption for better text display control when viewing data - Improved examples: Enhanced usage examples and documentation
🔄 Improved rename command:
- Pair-based renaming: Easier column renaming with more intuitive syntax for bulk operations.
📊 Enhanced sort command:
- Natural sorting: Added
--naturaloption for human-friendly sorting (e.g., "file1.txt", "file2.txt", "file10.txt" sorts "naturally"; previously lexicographical sorting would sort it as "file1.txt", "file10.txt", "file2.txt")
⚡ Performance improvements:
- Memory optimizations: Multiple performance enhancements across
frequency,stats,validate, andtransposecommands - Buffer optimizations: Improved I/O performance with better buffer sizing for various operations
- Polars engine upgrade: Updated to the latest Polars 0.49.x series for better performance and stability
🔧 Enhanced validation:
- Robust JSON Schema validation: More granular error messages and better schema validation
- Improved error reporting: Clearer messages to help debug validation issues
- UTF-8 handling: Better handling of invalid records with improved debug output
🌐 Geocoding improvements:
- Updated geosuggest: Bumped to version 0.8 with direct index update support for better geocoding performance
🔗 SQL enhancements (joinp and sqlp):
- Decimal comma support: Added
--decimal-commaoption for writing operations, improving international data support - Better validation: Enhanced delimiter and decimal comma validation
🏗️ Infrastructure updates:
- Rust 1.88 MSRV: Updated minimum supported Rust version
- Dependency updates: Comprehensive updates to all major dependencies including Polars, Tokio, and many others
- Compilation optimizations: Various improvements for faster builds and better runtime performance
Added
New Features:
lens: addfile:support to load prompts from files #2805lens: add--wrap-modeoption #2805rename: pair-based renaming for easier bulk column renaming #2806sort: add--naturaloption for natural/human-friendly sorting #2808schema: set JSON Schema description to the command line used for generationjoinp&sqlp: add--decimal-commaoption for writing operationsjoinp: add--decimal-commaand--delimitervalidationsqlp: add--decimal-commaand--delimitervalidationvalidate: more robust JSON Schema schema validation with granular error messagesvalidate: show invalid record in debug format for UTF-8 failures- Enhanced completions for qsv v5.1.0 and v6.0.0
Documentation & Examples:
lens: improved examples in usage textschema: expand examples and add-Pshortcut for--promptoptionsqlp: update description to note support for input beyond CSVs- Polars SQL documentation noting it's a PostgreSQL dialect
- Added link to Polars 0.49.0 release notes
- MSRV documentation updated to Rust 1.88
- Additional conditions for when to use "portable" binaries
Changed
Performance Improvements:
frequency: microoptimize null value handling and preallocate vectorsstats: preallocate with_capacity for Unsorted struct and coefficient of variation handling improvementstranspose: performance refactoring with optimized buffer handlingvalidate: microoptimizations for JSON instance handling and buffer capacity improvementsapply: bigger reader buffer as apply is batch oriented- Enabled setter for read and write buffer sizing configuration
- Various microoptimizations across commands
Polars Engine Updates:
- Bumped Polars from 0.48 to 0.49.x series
- Adapted to new Polars PlPath API
- Updated to use latest Polars upstream throughout development cycle
- Enabled simd-json compiler hints feature on nightly builds
Dependency Updates:
-
Major updates:
- Polars: 0.48 → 0.49.x
- Tokio: 1.45.1 → 1.46.1
- qsv-stats: 0.33.0 → 0.35.0
- kiddo: 5.0.3 → 5.2.2
- indexmap: 2.9.0 → 2.10.0
- calamine: updated to latest upstream
- redis: 0.32.2 → 0.32.3
- sysinfo: 0.35.2 → 0.36.0
- geosuggest: bumped to 0.8
-
Build dependencies:
- flexi_logger: 0.31.0 → 0.31.2
- arboard: 3.5.0 → 3.6.0
- minijinja: 2.10.2 → 2.11.0
- minijinja-contrib: 2.10.2 → 2.11.0
- zip: 4.1.0 → 4.3.0
- reqwest: 0.12.20 → 0.12.22
- indicatif: 0.17.11 → 0.17.12
- phf: 0.11.3 → 0.12.1
- human-panic: 2.0.2 → 2.0.3
- jaq-std: 2.1.1 → 2.1.2
- jaq-core: 2.2.0 → 2.2.1
- jaq-json: 1.1.2 → 1.1.3
Code Quality & Maintenance:
- Applied clippy lint suggestions including
collapsible_if,needless_return,redundant_clone, andmanual_is_multiple_of - Updated MSRV to Rust 1.88
- Set nightly to 2025-06-27
- Removed hardware-lock-elision feature on parking_lot
- No longer use similar-asserts crate, reverted to standard assert_eq
- Better TOML formatting
- Removed unneeded dependency aliases
- Various code refactoring for better maintainability
Infrastructure:
- Updated csvlens integration with natural sorting support
- Switched dependency management approaches for better upstream compatibility
- Pin plist to 1.7.3 to avoid unnecessary quick-xml bumps
- Use latest calamine upstream consistently
Fixed
validate: clearer JSON Schema schema error messages to differentiate validation typesround_num(): should return an empty string ifdec_f64.is_nan()joinp: non-equi-join test result order deterministic issues- Enhanced Snappy file decompression robustness
- Fixed geometric mean calculation in stats
- Better UTF-8 record validation with debug output
- Various test adjustments to account for dependency updates and behavior changes
- Resolved several clippy warnings and code quality issues
Test Updates:
rename: add pair-renaming testssort: add natural sort testsjoinp: add decimal_comma testssqlp: add decimal-comma validation testsvalidate: add JSON Schema schema validation testsstats: adjust test cases for qsv-stats 0.35.0 changesexcel: re-enable and revert formula tests based on upstream changes
Development Notes
Benchmarks:
- Comprehensive benchmarking for versions 5.1.0 and 6.6.1
- Performance comparisons available for major operations
Continuous Integration:
- Multiple dependency updates via Dependabot automation
- Comprehensive test coverage maintained throughout development
- Regular upstream synchronization with Polars and other major dependencies
Pull Requests
NOTE: The changelog entries below only document changes with a corresponding PR. Several changes were committed to master directly and are documented in the release highlights above.
Added
lens: add--wrap-modeoption in #2805rename: add pair-based renaming in #2806sort: add--naturalsort option in #2808
Changed
geocode: now uses the faster geosuggest 0.8 crate.index-updatesubcommand now generates command to use geosuggest crate directly to update/create the index instead of doing it internally.schema: when generating JSON schema, description property set to cmdline used to generate the JSON schema in #2796sqlp&joinp:--decimal-commaoption is not only for parsing input CSVs, it's also used when writing output CSVs in #2800transpose: performance refactoring in #2827validateimproved JSON Schema schema validation in #2803- update completions for qsv v5.1.0 by @rzmk in #2804
- dep: bump polars to latest upstream - adapt to PlPath api reqt in #2822
- perf: bump to faster geosuggest to 0.8 in #2837
- build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in #2814
- build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in #2801
- build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in #2812
- build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in #2794
- build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in #2833
- build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in #2818
- build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in #2830
- build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in #2831
- build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in #2832
- build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in #2815
- build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in #2816
- build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in #2797
...
5.1.0
[5.1.0] - 2025-06-17
Highlights
-
lensis now colorful by default, with a--monochromeoption to turn it off:qsv lens /tmp/NYC_311_SR_2010-2020-sample-1M.csv
-
lenscan now have custom prompts with the--promptoption (with support for ANSI escape codes to format the prompt). Meant to be paired with the--echo-column <colname>option, e.g.:qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \ /tmp/NYC_311_SR_2010-2020-sample-1M.csv
- the
qsv-statscrate - the underlying engine behind the centralstats,frequencyand "smart" commands, got a lot of love in this release validategot a tad faster while decreasing its memory footprint. The new--no-format-validationoption now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.
Added
lens: add--promptoption, add examples to regex-enabled options #2772lens: add--monochromeoption, otherwise, columns displayed in different colors #2761validate: add--no-format-validationoption when in JSON Schema mode #2762- docs: add shell completions badges by @rzmk in #2760
- feat: added criterion trim algorithm microbenchmarks #2789
Changed
frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tablesgeocode: refactor regex handling for performance & maintainabilityjson: preserve key order #2777stats: performance microoptimizations - useunwrap_unchecked()instead of justunwrap()in hot sampling functionsvalidate: major refactoring for added performance/memory efficiency- chore: temporarily use qsv-calamine until a new calamine is released #2790
- Bump cpc from 1.9 to 2 #2770
- deps: bump criterion from 0.5 to 0.6 #2791
- deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
- deps: temporarily use qsv-calamine until a new calamine is released #2790
- deps: bump our patched forks of
cached,csvs_convert,json-objects-to-csv,jsonschema,localzone,rfd,self_updateuntil PRs are merged or new releases are made - deps: bump zip from 3 to 4 in 75909d2
- deps: bump polars to 0.48.1 at 49ce57a revision
- build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in #2766
- build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in #2778
- build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in #2781
- build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in #2765
- build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in #2793
- build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in #2779
- build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in #2787
- build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in #2792
- build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in #2758
- build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in #2771
- build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in #2768
- build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in #2785
- deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the
stats&frequencycommands #2786 - deps: bump redis from 0.29.5 to 0.32
- build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in #2764
- build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in #2767
- build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in #2773
- build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in #2782
- build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in #2788
- build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in #2780
- build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in #2774
- build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in #2775
- build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in #2759
- build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in #2757
- applied select clippy suggestions
- updated indirect dependencies
- set Rust nightly to 2025-05-21, the same nightly Polars uses 872ade1
Fixed:
- fix:
frequencyrecover from non-fatal absence of stats cache, instead of panicking b2821a0 - fix: flaky
jsontests caused by hardcoding name of intermediate file - 62ca310 - fix: flaky
reverseproperty tests by handling BOM characters cefd490 - fix:
util::process_inputhelper does not honorQSV_SKIP_FORMAT_CHECKwhen processing dir input #2784
Full Changelog: 5.0.3...5.1.0
5.0.3
[5.0.3] - 2025-05-22 "The Geo Release" 🌍
qsv 5.0.3 represents a major milestone with significant enhancements to its geospatial data processing capabilities.
They're targeted to support the Datapusher+ Data Resource Upload First (DRUF) workflow for "automagical metadata inferencing" - focusing on DCAT-US v3 recommended spatial and temporal properties that would otherwise be too tedious to manually compile:
New Geocoding Capabilities
- Added IP geolocation with new
--iplookupand--iplookupnowsubcommands in thegeocodecommand - Integrated Maxmind GeoLite2 database support for accurate IP-to-location mapping
- Enhanced geocoding performance (up to 5x faster) with rkyv serialization (contributed by @estin)
Enhanced geoconvert Command
- Added CSV input support alongside existing geospatial formats
- Introduced GeoJSONL output format for streaming workflows
- Added stdin support for all formats except SHP input
- New coordinate handling options:
--latitudeand--longitudeparameters - Added
--max-lengthoption for output control - Comprehensive test coverage additions
- all contributed by @rzmk!
🚀 Performance & Infrastructure Improvements
Polars Integration
- Upgraded Polars from 0.46.0 to 0.48.1 with intermediate releases
- Enhanced Polars schema support across multiple commands (
schema,joinp,pivotp,sqlp) - Added
--polarsmode to theschemacommand to explicitly create a polars schema file on demand, rather than as a side-effect of thesqlpcommand using its--cache-schemaoption.
Core Performance
- Microoptimizations in the
sortcommand - Improved file handling with tempfile usage in
edit --in-place - Enhanced auto-decompression support now available suite-wide for gz, zlib, and zst files
🛠️ New Features & Usability
Enhanced Commands
edit: New--in-placeoption for direct file modification with automatic backup (.bak) creationforeach: Added "/" to splitter pattern for improved path handlingstats: NewQSV_STATS_STRING_MAX_LENGTHenvironment variable for string analysis controlto: Added--all-stringsoption for simplified data type handling
Distribution & Installation
- Added conda package support with installation instructions
- New download badges and streamlined installation documentation
- Retired older glibc-2.31 and musl-1.2.3 "prebuilt-older" binaries as Ubuntu 20.04 has been retired and no longer supported with GitHub Actions.
- Discontinued MSI installer in favor of the easier qsv Windows Easy Installer (thanks @rzmk!)
Quality & Stability
- Applied multiple clippy lint suggestions for code quality
- Enhanced test coverage, particularly for geospatial functions
- Improved documentation with better examples and clearer explanations
- Fixed stdin handling issues in the
splitcommand
🎯 Default Feature Changes
The qsvdp variant now includes geocode and geoconvert commands by default, making geospatial functionality more accessible to Datapusher+ users with Jinja2-powered metadata formulas.
NOTE:
- for qsv v5.0.3,
cargo installwill NOT worked as the calamine crate (which powers theexcelcommand) is pinned to zip 2.5.0 which was yanked.- unfortunately, the broken zip dependency also prevents us from publishing qsv 5.0.3 to
crates.io- for both cases, either install the prebuilts or compile from source with
cargo build.
Added
edit: add--in-place(and test) which uses tempfile by @rzmk in #2744foreach: add "/" to splitter pattern #2754geoconvert: add CSV input and GeoJSONL output and use buf by @rzmk in #2690geoconvert: add stdin support (except for SHP input) by @rzmk in #2699geoconvert: add--latitudeand--longitudeoptions by @rzmk in #2707geoconvert: add--max-lengthoption #2711geocode: addiplookupandiplookupnowsubcommands #2741- tests:
geoconvert- add basic tests and move tests to test_geoconvert.rs by @rzmk in #2717 qsvdpnow include geocode & geoconvert commands by default #2697stats: QSV_STATS_STRING_MAX_LENGTH env var #2709to: add--all-stringsoption #2746- docs: add conda install command by @rzmk in #2718
- docs: add qsv download badges and update install instructions by @rzmk in #2721
Changed
geocode: bump geosuggest crate to use much faster rkyv serialization by @estin in #2734sort: microoptimize #2748- feat: update completions for qsv v5.0 by @rzmk in #2752
- Improved Polars Schema support #2703
- Bump polars from 0.46.0 to 0.47.0 87bf7b7
- Bump polars py-1.30.0-beta-1 #2747
- Bump polars to 0.48.0 5a037ee
- build(deps): bump polars from 0.48.0 to 0.48.1 by @dependabot in #2750
- build(deps): bump polars-ops from 0.48.0 to 0.48.1 by @dependabot in #2751
- build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 by @dependabot in #2713
- build(deps): bump actix-web from 4.10.2 to 4.11.0 by @dependabot in #2742
- build(deps): bump bytemuck from 1.22.0 to 1.23.0 by @dependabot in #2719
- build(deps): bump chrono from 0.4.40 to 0.4.41 by @dependabot in #2722
- build(deps): bump ext-sort from 0.1.4 to 0.1.5 by @dependabot in #2736
- build(deps): bump file-format from 0.26.0 to 0.27.0 by @dependabot in #2735
- build(deps): bump pyo3 from 0.24.1 to 0.24.2 by @dependabot in #2708
- build(deps): bump jaq-json from 1.1.1 to 1.1.2 by @dependabot in #2714
- build(deps): bump jaq-std from 2.1.0 to 2.1.1 by @dependabot in #2715
- build(deps): bump jaq-core from 2.1.1 to 2.2.0 by @dependabot in #2716
- build(deps): bump jsonschema from 0.29.1 to 0.30.0 by @dependabot in #2704
- build(deps): bump libc from 0.2.171 to 0.2.172 by @dependabot in #2696
- build(deps): bump sysinfo from 0.34.2 to 0.35.0 by @dependabot in #2724
- build(deps): bump minijinja from 2.9.0 to 2.10.0 by @dependabot in #2727
- build(deps): bump minijinja from 2.10.1 to 2.10.2 by @dependabot in #2732
- build(deps): bump minijinja-contrib from 2.9.0 to 2.10.0 by @dependabot in #2728
- build(deps): bump minijinja-contrib from 2.10.1 to 2.10.2 by @dependabot in #2733
- build(deps): bump pyo3 from 0.24.2 to 0.25.0 by @dependabot in #2745
- build(deps): bump rand from 0.9.0 to 0.9.1 by @dependabot in #2702
- build(deps): bump simd-json from 0.15.0 to 0.15.1 by @dependabot in #2701
- build(deps): bump sysinfo from 0.35.0 to 0.35.1 by @dependabot in #2740
- build(deps): bump tempfile from 3.19.1 to 3.20.0 by @dependabot in #2739
- build(deps): bump tokio from 1.44.2 to 1.45.0 by @dependabot in #2731
- bump indirect dependencies
- apply select clippy lint suggestions
- bump MRSV to 1.87.0
Fixed:
- docs: fix typo in apply operations replace example by @HarrisonMc555 in #2743
- fix:
splitsave stdin to tempfile #2706
New Contributors
- @estin made their first contribution in #2734
- @HarrisonMc555 made their first contribution in #2743
Full Changelog: 4.0.0...5.0.3
4.0.0
[4.0.0] - 2025-04-13
Highlights:
This is a major release with numerous improvements!
- qsv can now read more file formats by leveraging the Polars engine:
- Arrow/IPC, Avro, Parquet, JSON (JSON array) and JSONL
- Automatic decompression support for compressed CSV file dialects (csv, tsv/tab & ssv) using gzip (.gz), zlib (.zlib), zstd (.zst) compression formats. (e.g. data.csv.gz, data.tsv.zst, data.ssv.zlib)
qsv lens data.csv.gz qsv sample 1000 data.parquet | qsv stats | qsv lens qsv frequency data.tab.zlib | qsv lens qsv search Waldo data.ssv.zst | qsv table qsv select 2-5 data.jsonl | qsv lens
- New
geoconvertcommand for converting spatial formats (GeoJSON and SHP) to CSV:# convert TX_cities.geojson to CSV, filter out the geometry column and browse with lens qsv geoconvert TX_cities.geojson geojson csv | qsv select '!geometry' | qsv lens - Enhanced
splitcommand with new--filteroption:- Similar to GNU split
--filter - Spawns a subprocess for each chunk
# split input.csv into outdir, each chunk having 100,000 rows, gzip compressing each chunk qsv split --size 100000 outdir data.csv --filter 'gzip $FILE'
- Similar to GNU split
- Expanded
tocommand:- added LibreOffice/OpenOffice Calc (ODS) support
- re-enabled
parquetgeneration now that it's using Arrow instead of DuckDB (which made for very long compiles)
- New
uniqueCombinedWithJSON Schema custom keyword invalidatecommand:- Allows validating uniqueness across multiple columns
- Useful for composite key validation
- QSV_DOTENV_PATH now supports the sentinel value "<NONE>" to disable dotenv processing altogether.
Added
geoconvert: new command to convert spatial formats to CSV by @rzmk in #2681 & #2688split: add--filteroptions #2660sqlp: add decimal type support #2646to: add backtoparquet support #2665- feat: Extended auto decompression support. In addition to snappy auto-decompression, auto-decompress CSV dialects (tsv/tab & ssv files) using gzip, zlib and zstd compression formats #2671
to: add ODS support #2674validate: add uniqueCombinedWith custom JSON Schema Validation keyword #2636- feat:
promptadd file formats supported to dialog box filter when polars feature is enabled #2667 - feat: add
QSV_POLARS_FLOAT_PRECISIONenv var #2678 tests: add tests for https://100.dathere.com/lessons/3 by @rzmk in #2638
Changed
qsvdpbinary variant can now use thegeocode&geoconvertcommands 50f0046geocodefeature now gates thegeocode&geoconvertcommand 9d046e8stats: made stdin handling more robust by adding delimiter inferencing ddecd98- feat: setting QSV_DOTENV_PATH to sentinel value "<NONE>" disables dotenv processing #2684
- refactor: polars special formats support #2683
contrib(completions): update completions to v3.3.0 by @rzmk in #2626contrib(completions): update completions for qsv v4.0.0 by @rzmk in #2677- deps: bump polars to 0.46.0 at py-1.27.1 tag #2675 and e5d29d7
- build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 by @dependabot in #2627
- build(deps): bump arboard from 3.4.1 to 3.5.0 by @dependabot in #2653
- build(deps): bump chrono-tz from 0.10.2 to 0.10.3 by @dependabot in #2623
- build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 by @dependabot in #2672
- build(deps): bump csvs_convert from 0.11.0 to 0.11.1 by @dependabot in #2686
- build(deps): bump data-encoding from 2.8.0 to 2.9.0 by @dependabot in #2685
- build(deps): bump flate2 from 1.1.0 to 1.1.1 by @dependabot in #2649
- build(deps): bump flexi_logger from 0.29.8 to 0.30.0 by @dependabot in #2650
- build(deps): bump flexi_logger from 0.30.0 to 0.30.1 by @dependabot in #2651
- build(deps): bump governor from 0.8.1 to 0.9.0 by @dependabot in #2625
- build(deps): bump governor from 0.9.0 to 0.10.0 by @dependabot in #2631
- build(deps): bump jsonschema from 0.29.0 to 0.29.1 by @dependabot in #2635
- build(deps): bump log from 0.4.26 to 0.4.27 by @dependabot in #2622
- build(deps): bump mimalloc from 0.1.44 to 0.1.45 by @dependabot in #2652
- build(deps): bump minijinja from 2.8.0 to 2.9.0 by @dependabot in #2643
- build(deps): bump minijinja-contrib from 2.8.0 to 2.9.0 by @dependabot in #2642
- build(deps): bump pyo3 from 0.24.0 to 0.24.1 by @dependabot in #2645
- build(deps): bump qsv-dateparser from 0.12.1 to 0.13.0 by @dependabot in #2639
- build(deps): bump qsv-sniffer from 0.10.3 to 0.11.0 by @dependabot in #2640
- build(deps): bump redis from 0.29.2 to 0.29.4 by @dependabot in #2663
- build(deps): bump redis from 0.29.4 to 0.29.5 by @dependabot in #2666
- build(deps): bump smallvec from 1.14.0 to 1.15.0 by @dependabot in #2656
- build(deps): bump sysinfo from 0.34.0 to 0.34.1 by @dependabot in #2637
- build(deps): bump sysinfo from 0.34.1 to 0.34.2 by @dependabot in #2648
- build(deps): bump titlecase from 3.4.0 to 3.5.0 by @dependabot in #2669
- build(deps): bump tokio from 1.44.1 to 1.44.2 by @dependabot in #2662
- applied select clippy lint suggestions
- bumped indirect dependencies to latest version
Fixed
- fix:
selectpanic when idx is out of bounds #2670 - fix: correct link to qsv-dateparser accepted date formats #2632
- fix: reset SIGPIPE handling #2664
- docs: fix typo it's -> its by @rzmk in #2680
Full Changelog: 3.3.0...4.0.0
3.3.0
[3.3.0] - 2025-03-23
Highlights:
statsgot another round of improvements:- boolean inferencing is now configurable!
Before, it was limited to a simple, English-centric heuristic:- When a column's cardinality is 2; and the 2 values' first characters are
0/1,t/fory/ncase-insensitive, the data type of the column is inferred as boolean - With the new
--boolean-patterns <arg>option, we can now specify arbitrarytrue_pattern:false_patternpattern pairs. Each pattern can be a string of length > 1, case-insensitive. If a pattern ends with "*", it is treated as a prefix.
For example,t*:f*matches "true", "Truthy", "T" as boolean true so long as the corresponding false pattern (e.g. "Fake, False, f") is also matched. Bear in mind that the cardinality still needs to be 2, so multiple matches on the same column on different patterns will disqualify the field as boolean if cardinality > 2 (e.g. If a column's domain is "True", "truthy" and "False", it doesn't qualify as it's cardinality is 3. On the other hand, if it's "True", "true", "False", "false", "FALSE" - it still qualifies as they resolve to just "true/false" case-insensitive).
For backwards compatibility, the default true/false pairs are1:0,t*:f*,y*:n*.
- When a column's cardinality is 2; and the 2 values' first characters are
- percentiles can now be computed!
By enabling the--percentilesflag,statswill now return the 5th, 10th, 40th, 60th, 90th and 95th percentile by default using the nearest-rank method for all numeric and date/datetime columns. The returned percentiles can be configured to return different percentiles using the--percentile-list <arg>option.
Note that the method for computing quartiles (Method 3) is basically a specialized implementation of the nearest rank method for q1 (25th), q2 (50th or median) and q3 (75th percentile), thus the choice of non-overlapping defaults for--percentile-list.
- boolean inferencing is now configurable!
frequency: now usesqsv-stats0.32.0, which uses the more memory-efficient, often fasterfoldhashcrate- in the same vein, by replacing
ahashwithfoldhashsuite-wide, qsv got a lot more memory-efficient and often faster when doing hash lookups sample: "streaming" bernoulli sampling now works for any remotely hosted CSVs with servers that support chunked downloads, without requiring range request support.- we're now using the latest Polars engine - v0.46.0 at the py-1.26.0 tag.
Added
Changed
- refactor: replace ahash with faster foldhash #2619
- replace std
assert_eq!macro withsimilar_asserts::assert_eq!macro for easier debugging #2605 - deps: bump polars to 0.46.0 at py-1.25.2 tag #2604
- deps: bump Polars to v0.46.0 at py-1.26.0 tag #2621
- build(deps): bump actix-web from 4.9.0 to 4.10.2 by @dependabot in #2591
- build(deps): bump indexmap from 2.7.1 to 2.8.0 by @dependabot in #2592
- build(deps): bump mimalloc from 0.1.43 to 0.1.44 by @dependabot in #2608
- build(deps): bump qsv-stats from 0.30.0 to 0.31.0 by @dependabot in #2603
- build(deps): bump qsv-stats from 0.31.0 to 0.32.0 by @dependabot in #2620
- build(deps): bump reqwest from 0.12.12 to 0.12.13 by @dependabot in #2593
- build(deps): bump reqwest from 0.12.13 to 0.12.14 by @dependabot in #2596
- build(deps): bump reqwest from 0.12.14 to 0.12.15 by @dependabot in #2609
- build(deps): bump rfd from 0.15.2 to 0.15.3 by @dependabot in #2597
- build(deps): bump rust_decimal from 1.37.0 to 1.37.1 by @dependabot in #2616
- build(deps): bump simd-json from 0.14.3 to 0.15.0 by @dependabot in #2615
- build(deps): bump tempfile from 3.18.0 to 3.19.0 by @dependabot in #2602
- build(deps): bump tempfile from 3.19.0 to 3.19.1 by @dependabot in #2612
- build(deps): bump uuid from 1.15.1 to 1.16.0 by @dependabot in #2601
- build(deps): bump zip from 2.2.3 to 2.4.1 by @dependabot in #2607
- apply select clippy lint suggestions
- bumped indirect dependencies to latest version
- set Rust nightly to 2025-03-07, the same version Polars uses 17f6bdb
Fixed
- updated lock file, primarily to fix CVE-2025-29787 e44e5df
luau: fix flaky register_lookup_table CI test that only intermittently fails in Windows by using buffered writer in lookupwrite_cache_filehelper f494b46sample: refactor "streaming" Bernoulli sampling, so it actually works without requiring range requests support #2600
Full Changelog: 3.2.0...3.3.0

