Releases: dathere/qsv
0.136.0
π qsv pro is now available in the Microsoft Store! π
It's Data Wrangling Democratized on the Desktop, featuring:
- π Familiar Spreadsheet Interface
tap the power of qsv to query, analyze, enrich, scrub and transform huge Excel files and multi-gigabyte CSV files in seconds, without having to deal with the command-line.
CKAN desktop client
designed to make data publishing easier for portal operators and data stewards using the
CKAN platform.- π₯ Flow
allows you to build custom node-based flows and data pipelines using a visual interface. - π§ Toolbox
features an ever-expanding library of reusable scripts for common data-wrangling use cases. - β and more!
Natural Language Interface (RAG), Polars SQL query support, an API, Python/Luau support, automatic Data Dictionaries, DCAT 3 metadata profile inferencing, along with a retinue of other cloud-based services (e.g. customizable street-level geocoding, data feeds, reference data lookups, geo-ip lookups, cloud storage support,.qsvfile format, etc.) that will be unveiled in future versions.
Like qsv, we're iterating rapidly with qsv pro, so your feedback is essential. Give it a try!
Other highlights:
excel: new--tableoption for XLSX files; new--header-rowoption; expanded--rangeoption, adding support for Named Ranges and absolute ranges (e.g.Sheet2!$A$1:$J$10); and expanded metadata export now including Named Ranges and Tables (for XLSX files)- Improved performance for several commands (
apply,datefmt,tojsonlandvalidate) through automatic batch size optimization validate:dynamicEnumcustom JSON Schema keyword in validate command (renamed fromdynenum) and enhanced email validationschema: automatic JSON Schemaconstinferencing for columns with just one value- Significant dependency updates, including latest upstream versions of Polars, jsonschema, and serde_json with unreleased performance upgrades, new features and fixes
NOTE: You can see qsv & qsv pro in action in our "The Problem with Data Portals" webinar Wed, Oct 23, 2024. 1-2pm EDT
Added
- π qsv pro is now in the Microsoft Store!!! π
apply,datefmt,tojsonl,validate: added logic to automatically determine optimal batch size for better parallelization #2178enum: added--new-columnsupport for all enum modes, not just--increment#2173excel: new--tableoption for XLSX files #2194excel: new--header-rowoption 458f79aexcel: expanded range and metadata options #2195schema: added JSON Schema automaticconstinferencing #2180- Add signing step to qsv MSI installer GitHub Action by @rzmk in #2182
contrib(completions): add--tableoption toqsv excelby @rzmk in #2197completions: add--header-rowoption toqsv excele8794d5- added new
apply operations sentimentbenchmark b745e64 docs: added indexing section to PERFORMANCE.md 804145a
Changed
stats: various minor micro-optimizations 62d95fc 2c2862avalidate: renamed custom keyworddynenumtodynamicEnumto be more consistent with JSON schema naming conventions 0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cfvalidate: optimizations for increased performance; replace serde_json with simd_json 0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf- apply new
clippy::ref_optionlint to Config::new API #2192 - Update debian package readme by @tino097 in #2187
deps: bumpcalaminefrom 0.25 to 0.26 b42279adeps:jsonschemause latest 0.22.3 upstream with unreleased features/fixesdeps:polarsuse latest 0.43.1 upstream with unreleased features/fixesdeps: created our own fork of unmaintained vader_sentiment crate b426761deps: useserde_jsonupstream with unreleased perf improvement/fixes https://github.com/jqnatividad/qsv/blob/1c1174b3b8b65d9dfd9c841597366fb09d0a047c/Cargo.toml#L221- build(deps): bump flate2 from 1.0.33 to 1.0.34 by @dependabot in #2171
- build(deps): bump flexi_logger from 0.29.0 to 0.29.1 by @dependabot in #2189
- build(deps): bump flexi_logger from 0.29.1 to 0.29.2 by @dependabot in #2196
- build(deps): bump hashbrown from 0.14.5 to 0.15.0 by @dependabot in #2186
- build(deps): bump jsonschema from 0.20.0 to 0.21.0 by @dependabot in #2177
- build(deps): bump jsonschema from 0.22.1 to 0.22.2 by @dependabot in #2191
- build(deps): bump regex from 1.10.6 to 1.11.0 by @dependabot in #2176
- build(deps): bump reqwest from 0.12.7 to 0.12.8 by @dependabot in #2183
- build(deps): bump simd-json from 0.14.0 to 0.14.1 #2199
- build(deps): bump simple-expand-tilde from 0.4.2 to 0.4.3 by @dependabot in #2190
- build(deps): bump sysinfo from 0.31.4 to 0.32.0 by @dependabot in #2193
- build(deps): bump tempfile from 3.12.0 to 3.13.0 by @dependabot in #2175
- apply select clippy lints
- bumped indirect dependencies
- aligned Rust nightly to Polars nightly - 2024-09-29 7cd2de1
Fixed
schema: fixenumso it only adds a list when the number of unique values >--enum-threshold#2180- Upload artifact fix for Debian package publishing by @tino097 in #2168
- fixed typos configuration 627de89
- fixed various GitHub Actions publishing workflow issues
Full Changelog: 0.135.0...0.136.0
0.135.0
Highlights
JSON Schema validation just got a whole lot more powerful with the introduction of qsv's custom dynenum keyword!
With dynenum, you can now dynamically lookup valid enum values from a CSV (on the filesystem or on a URL), allowing for more flexible and responsive data validation.
Unlike the standardenum keyword, dynenum does not require hardcoding valid values at schema definition time, and can be used to validate data against a changing set of valid values.
For an example, see #1872 (reply in thread).
In an upcoming qsv pro release, we're planning on making dynenum even more powerful by allowing you to easily specify high-value reference data (e.g. US Census data, World Bank data, data.gov, etc.) that is maintained at data.dathere.com and other CKAN instances.
This release also add the custom currency JSON Schema format, which enables currency validation according to the ISO 4217 standard.
The Polars engine was also upgraded to 0.43.1 at the py-1.81.1 tag - making for various under-the-hood improvements for the sqlp, joinp and count commands, as we set the stage for more Polars-powered features in future releases.
Added
foreach: enabledforeachcommand on Windows prebuilt binaries def9c8flens: added support for QSV_SNIFF_DELIMITER env var and snappy auto-decompression 8340e89sample: add--max-sizeoption e845a3cvalidate: addeddynenumcustom JSON Schema keyword for dynamic validation lookups #2166tests: add tests for https://100.dathere.com/lessons/2 by @rzmk in #2141- added
stats_sortedandfrequency_sortedbenchmarks - added
validate_dynenumbenchmarks
Changed
json: add error for empty key and update usage text by @rzmk in #2167prompt: gatepromptcommand behindpromptfeature #2163validate: expandedcurrencyJSON Schema custom format to support ISO 4217 currency codes and alternate formats 5202508validate: migrate to newjsonschemacrate api 5d65054- Update ubuntu version for deb package by @tino097 in #2126
contrib(completions): update completions for qsv v0.134.0 and fix subcommand options by @rzmk in #2135contrib(completions): add--max-sizecompletion forsampleby @rzmk in #2142deps: bump to polars 0.43.1 at py-1.81.1 #2130deps: switch back to calamine upstream instead of our fork 677458f- build(deps): bump actix-governor from 0.5.0 to 0.6.0 by @dependabot in #2146
- build(deps): bump anyhow from 1.0.87 to 1.0.88 by @dependabot in #2132
- build(deps): bump arboard from 3.4.0 to 3.4.1 by @dependabot in #2137
- build(deps): bump bytes from 1.7.1 to 1.7.2 by @dependabot in #2148
- build(deps): bump geosuggest-core from 0.6.3 to 0.6.4 by @dependabot in #2153
- build(deps): bump geosuggest-utils from 0.6.3 to 0.6.4 by @dependabot in #2154
- build(deps): bump jql-runner from 7.1.13 to 7.2.0 by @dependabot in #2165
- build(deps): bump jsonschema from 0.18.1 to 0.18.2 by @dependabot in #2127
- build(deps): bump jsonschema from 0.18.2 to 0.18.3 by @dependabot in #2134
- build(deps): bump jsonschema from 0.18.3 to 0.19.1 by @dependabot in #2144
- build(deps): bump jsonschema from 0.19.1 to 0.20.0 by @dependabot in #2152
- build(deps): bump pyo3 from 0.22.2 to 0.22.3 by @dependabot in #2143
- build(deps): bump rfd from 0.14.1 to 0.15.0 by @dependabot in #2151
- build(deps): bump simple-expand-tilde from 0.4.0 to 0.4.2 by @dependabot in #2129
- build(deps): bump qsv_currency from 0.6.0 to 0.7.0 by @dependabot in #2159
- build(deps): bump qsv_docopt from 1.7.0 to 1.8.0 by @dependabot in #2136
- build(deps): bump redis from 0.26.1 to 0.27.0 by @dependabot in #2133
- build(deps): bump simdutf8 from 0.1.4 to 0.1.5 by @dependabot in #2164
- bump indirect dependencies
- apply select clippy lint suggestions
- several usage text/documentation improvements
- bump MSRV to 1.81.0
Fixed
validate: correctfail_validation_error!macro; reformat error messages to use hyphens as the JSONschema error message already starts with "error:" 9a25524- moved
--helpoutput from stderr to stdout as per GNU CLI guidelines #2138 lens: fixed parsing of lens options 1cdd1bcsearchset: fixed usage text for<regexset-file>9a60fb0- used patched forks of
arrow,csvlensandxlsxwritercrates that replaces a dependency on an old version oflexical-corewith known soundness issues - https://rustsec.org/advisories/RUSTSEC-2023-0086. Once those crates have updated theirlexical-coredependency, we will revert to the original crates.
Removed
- removed
promptcommand from qsvlite #2163 - publish: remove
lensfeature from i686 targets as it does not compile 959ca76 deps: remove anyhow dependency #2150
Full Changelog: 0.134.0...0.135.0
0.134.0
qsv pro v1 is here! π
If you've been using qsv for a while, even if you're a command-line ninja, you'll find a lot of new capabilities in qsv pro that can make your data wrangling experience even better!
Apart from making qsv easier to use, qsv pro has a multitude of features including: view interactive data tables; browse stats/frequency/metadata; run recipes and tools (scripts); run Polars SQL queries; use Natural Language queries (using Retrieval Augmented Generation (RAG) techniques); regular expression search; export to multiple file formats; download/upload from/to compatible CKAN instances; design custom node-based flows and data pipelines; interact with a local API from external programs including the qsv pro command; run various qsv commands in a graphical user interface; and the list goes on!
And that's just the beginning, there's more to come! You just have to try it!
Download qsv pro v1 now at qsvpro.dathere.com.
Other highlights include:
pro: new command to allow qsv to interact with the qsv pro API to tap into qsv pro exclusive features.lens: new command to interactively view CSVs using the csvlens crate.- The ludicrously fast
diffcommand is now easier to use with its--drop-equal-fieldsoption. @janriemer continues to work on hiscsv-diffcrate, and there's morediffUX improvements coming soon! statsaddssum_lengthandavg_length"streaming" statistics in addition to the existingmin_lengthandmax_lengthmetrics. These are especially useful for datasets with a lot of "free text" columns.statsalso got "smarter" and "faster" by dog-fooding its own statistics to make it run faster!
It's a little complicated, but the waystatsworks is that it compiles the "streaming" statistics on the fly first as it multiplex load the data across several threads, and the more expensive advanced statistics are "lazily" computed at the end.
Since we now compile "sort order" in a streaming manner, we use this info when deriving cardinality at the end to see if we can skip sorting - an otherwise necessary step to get cardinality which is done by "scanning" all the sorted values of a column. Everytime two neighboring values differ in a sorted column, it increments the cardinality count.
Apart from this "sort order" optimization, we also improved the "cardinality scan" algorithm - halving its memory footprint and making it faster still for larger datasets by parallelizing the computation. This in turn, makes thefrequencycommand faster and more memory efficient.
It's performance tweaks like these, that despite adding six metrics (is_ascii,sort_order,sum_length,avg_length,sem- standard error of the mean &cv- coefficient of variation) in recent releases, thatstatsis still able to compile 35 statistics and do GUARANTEED data type inferences of a million row, 41 column, 520 MB sample of NYC's 311 data in 1.327 seconds (753,580 records per second)!1- we now also use our own fork of the
csvcrate, featuring SIMD-accelerated UTF-8 validation and other minor perf tweaks, making the entire qsv suite faster still!
Added
pro: addqsv procommand to interact with qsv pro API by @rzmk in #2039lens: new command to interactively view CSVs using the csvlens crate #2117apply: add crc32 operation #2121count: add --delimiter option #2120diff: add flag--drop-equal-fieldsby @janriemer in #2114stats: addsum_lengthandavg_lengthcolumns #2113stats: smarter cardinality computation - added new parallel algorithm for large datasets (10,000+ rows) and updated sequential algorithm for smaller datasets 4e63fec
Changed
count: added comment to justify magic number 5241e39stats: use simdjson for faster JSONL parsing; micro-optimizecomputehot loop 0e8b734stats: standardized OVERFLOW and UNDERFLOW messages 38c6128sort: renamed symbol so eliminate devskim lint false positive warning 12db739- enable
lensfeature in GH workflows #2122 deps: bump polars 0.42.0 to latest upstream at time of release 3c17ed1deps: use our own optimized fork of csv crate, with simdutf8 validation and other minor perf tweaks e4bcd71- build(deps): bump serde from 1.0.209 to 1.0.210 by @dependabot in #2111
- build(deps): bump serde_json from 1.0.127 to 1.0.128 by @dependabot in #2106
- build(deps): bump qsv-stats from 0.19.0 to 0.22.0 #2107 #2112 cb1eb60
- apply select clippy lint suggestions
- updated several indirect dependencies
- made various doc and usage text improvements
Fixed
schema: Print an error if theqsv statsinvocation fails by @abrauchli in #2110
New Contributors
- @abrauchli made their first contribution in #2110
Full Changelog: 0.133.1...0.134.0
0.133.1
Highlights
1 |
This release doubles down on Polars' capabilities, as we now, as a matter of policy track the latest polars upstream. If you think qsv has a torrid release schedule, you should see Polars. They're constantly fixing bugs, adding new features and optimizations! To keep up, we've added Polars revision info to the --version output, and the --envlist option now includes Polars relevant env vars. We've also added support for the POLARS_BACKTRACE_IN_ERR env var to control whether Polars backtraces are included in error messages.We also removed the to parquet subcommand as its redundant with the Polars-powered sqlp's ability to create parquet files. This removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries smaller. |
Other highlights include:
- New
editcommand that allows you to edit CSV files. - The
countcommand's--widthoption now includes record width stats beyond max length (avg, median, min, variance, stddev & MAD). - The
fixlengthscommand now has--quoteand--escapeoptions. - The
statscommand adds asort_orderstreaming statistic.
NOTE: 0.133.0 was skipped because of a dev dependency conflict with the
csvs_convertcrate, preventing us from publishing 0.133.0 to crates.io. This has been resolved in 0.133.1.
Added
count: expanded--widthoptions, adding record width stats beyond max length (avg, median, min, variance, stddev & MAD). Also added--jsonoutput when using--width#2099edit: addqsv editcommand by @rzmk in #2074fixlengths: added--quoteand--escapeoptions #2104stats: addsort_orderstreaming statistic #2101polars: add polars revision info to--versionoutput e60e44fpolars: added Polars relevant env vars to--envlistoption 0ad68fepolars: add & documentPOLARS_BACKTRACE_IN_ERRenv var f9cc559
Changed
- Optimize polars optflags #2089
deps: bump polars 0.42.0 to latest upstream at time of release 3b7af51- bump polars to latest upstream, removing smartstring #2091
- build(deps): bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in #2094
- build(deps): bump flate2 from 1.0.32 to 1.0.33 by @dependabot in #2085
- build(deps): bump flexi_logger from 0.28.5 to 0.29.0 by @dependabot in #2086
- build(deps): bump indexmap from 2.4.0 to 2.5.0 by @dependabot in #2096
- build(deps): bump jsonschema from 0.18.0 to 0.18.1 by @dependabot in #2084
- build(deps): bump serde from 1.0.208 to 1.0.209 by @dependabot in #2082
- build(deps): bump serde_json from 1.0.125 to 1.0.127 by @dependabot in #2079
- build(deps): bump sysinfo from 0.31.2 to 0.31.3 by @dependabot in #2077
- build(deps): bump qsv-stats from 0.18.0 to 0.19.0 by @dependabot in #2100
- build(deps): bump tokio from 1.39.3 to 1.40.0 by @dependabot in #2095
- apply select clippy lint suggestions
- updated several indirect dependencies
- made various doc and usage text improvements
- pin Rust nightly to 2024-08-26 from 2024-07-26, aligning with Polars pinned nightly
Fixed
- Ensure portable binaries are "added" to the publish zip archive, instead of replacing all the binaries with just the portable version. Fixes #2083. 34ad206
Removed
- removed
to parquetsubcommand as its redundant withsqlp's ability to create parquet files. This also removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries much smaller #2088 - removed
smartstringdependency now that Polars has its own compact inlined string type 47f047e - removed
to parquetbenchmark
Full Changelog: 0.132.0...0.133.1
-
ChatGPT prompt: Using the logos for the Polars project and the qsv project as a baseline, can you create a version with the cowboy riding a polar bear instead? β©
0.132.0
Highlights
With this release, we finally finish the stats caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:
frequencyis 6x faster (frequency_index_stats_mode_auto).
Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).tojsonlis 2.67x faster (tojsonl_index)schemais two orders of magnitude (100x) faster!!! (schema_index)
The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.
The search, searchset, and replace commands now also have a --literal option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?,:,-,., etc.)
Added
search,searchset&replace: add--literaloption #2060 & 7196053slice: added usage text examples 04afaa3publish: added workflow to build "portable" binaries with CPU features disabledcontrib(completions): add--literalforsearchandsearchsetby @rzmk in #2061contrib(completions): add--literalcompletion toreplaceby @rzmk in #2062- add more polars metadata in
--versioninfo #2073 docs: added more info to SECURITY.md 609d4dfdocs: expanded Goals/Non-Goals 54998e3docs: added Installation "Option 0" quick start bf5bf82- added
search --literalbenchmark
Changed
-
stats,schema,frequency&tojsonl: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache #2055 -
rename
stats --stats-jsonoption tostats --stats-jsonl#2063 -
changed "broken pipe" error to a warning 7353275
-
docs: update multithreading and caching sections of PERFORMANCE.md 5e6bc45 -
deps: switch to our qsv-optimized fork of csv crate 3fc1e82 -
deps: bump polars from 0.41.3 to 0.42.0 #2051 -
build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in #2041
-
build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in #2071
-
build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in #2049
-
build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in #2070
-
build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in #2068
-
build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in #2043
-
build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in #2047
-
build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in #2054
-
build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in #2045
-
build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in #2052
-
apply select clippy lint suggestions
-
updated several indirect dependencies
-
made various usage text improvements
Fixed
stats: fix--outputdelimiter inferencing based on file extension #2065- make process_input helper handle stdin better #2058
docs: fix completions for--stats-jsonland qsv pro installation text update by @rzmk in #2072docs: added Note about whyluaufeature is disabled in musl binaries - ffa2bc5 & 27d0f8e
Removed
Full Changelog: 0.131.1...0.132.0
0.131.1
Changed
- deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
- build(deps): bump filetime from 0.2.23 to 0.2.24 by @dependabot in #2038
Fixed
frequency: change--stats-modedefault tononefromauto.
This is because of a big performance regression when using--stats-mode autoon datasets with columns with ALL unique values.
See #2040 for more info.
Full Changelog: 0.131.0...0.131.1
0.131.0
Highlights
- Refactored
frequencyto make it smarter and faster.
frequency's core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.
However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the ENTIRE column, with a hashmap entry for each column value with a count of 1.
Now, with the new--stats-modeoption (enabled by default),frequencycan compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.
If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.
Practically speaking, this makesfrequencyable to handle "real-world" datasets of any size.
To ensurefrequencyis as fast as possible, be sure toindexand computestatsfor your datasets beforehand. - Setting the stage for Datapusher+ v1 and...
The "itches we've been scratching" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.
DP+ is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!
We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended DCAT-US v3 metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers! - ...qsv pro 1.0
qsv pro is datHere's enterprise-grade data wrangling/curation workbench thatβs planned for v1.0 release this month.
Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.
We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.
Added
docs: added Shell Completion section 556a2ffdocs:add πͺ emoji in legend to indicate "automagical" commands 2753c90- Add building deb package (WIP) by @tino097 in #2029
- Added GitHub workflow to test debian package (WIP) by @tino097 in #2032
tests: added false positive to _typos.toml configuration d576af2- added more benchmarks
- added more tests
Changed
fetch&fetchpost: remove expired diskcache entries on startup 9b6ab5dfrequency: smarter frequency compilation with new--stats-modeoption #2030json: refactored for maintainability & performance 62e9216 and 4e44b18- improved
self-updatemessages 5c874e0 and 0aa0b13 contrib(completions):frequencyupdates & remove bashly/fish by @rzmk in #2031- Debian package update by @tino097 in #2017
publish: optimized enabled CPU features when building release binaries in all GitHub Actions "publishing" workflowspublish: ensure latest Python patch release is used when buildingqsvpybinary variants 2ab03a0 and ec6f486tests: also enabled CPU features in CI testsdocs: wordsmith qsv "elevator pitch" cc47fe6docs: point to https://100.dathere.com in Whirlwind tour fc49aefdeps: bump polars to latest upstream post py-1.41.1 release at the time of this release- build(deps): bump bytes from 1.6.1 to 1.7.0 by @dependabot in #2018
- build(deps): bump bytes from 1.7.0 to 1.7.1 by @dependabot in #2021
- build(deps): bump flate2 from 1.0.30 to 1.0.31 by @dependabot in #2027
- build(deps): bump indexmap from 2.2.6 to 2.3.0 by @dependabot in #2020
- build(deps): bump jaq-parse from 1.0.2 to 1.0.3 by @dependabot in #2016
- build(deps): bump redis from 0.26.0 to 0.26.1 by @dependabot in #2023
- build(deps): bump regex from 1.10.5 to 1.10.6 by @dependabot in #2025
- build(deps): bump serde_json from 1.0.121 to 1.0.122 by @dependabot in #2022
- build(deps): bump sysinfo from 0.30.13 to 0.31.0 by @dependabot in #2019
- build(deps): bump sysinfo from 0.31.0 to 0.31.2 by @dependabot in #2024
- build(deps): bump tempfile from 3.11.0 to 3.12.0 by @dependabot in #2033
- build(deps): bump serde from 1.0.204 to 1.0.205 by @dependabot in #2036
- apply select clippy suggestions
- updated several indirect dependencies
- made various usage text improvements
- bumped MSRV to 1.80.1
Fixed
sqlp&joinp: fixed.ssv.szoutput auto-compression support 5397f6c & d86ba63docs: fix link by @uncenter in #2026tests: correct misnamed test 8ae6000tests: fix flakyreverseproperty tests d86ba63
Removed
docs: "Quicksilver" is the name of the logo horse, not how you pronounce "qsv" e4551ae
New Contributors
Full Changelog: 0.130.0...0.131.0
0.130.0
Following the 0.129.0 release - the largest release to date, 0.130.0 continues to polish qsv as a data-wrangling engine, packing new features, fixes, and improvements, previewing upcoming features in qsv pro 1.0. Here are a few highlights:
Highlights
- Added
.ssv(semicolon separated values) automatic support. Semicolon separated values are now automatically detected and supported by qsv. Though not as common as CSV, SSV is used in some regions and industries, so qsv now supports it. - Added cargo deb compatibility. In preparation for the release of DataPusher+ 1.0, we're now making it easier to upgrade
qsvdpso CKAN administrators can install and upgrade it easily usingapt-get install qsvdporapt-get upgrade qsvdp.
DP+ is our next-gen, high-speed data ingestion tool for CKAN that uses qsv as its analysis engine. Its not only a robust, fast, validating data pump that guarantees high quality data, it also does extended analysis to infer and automatically derive high-quality metadata - what we call "automagical metadata". - Upgraded to the latest Polars upstream at the py-polars-1.3.0 tag. Polars tops the TPC-H Benchmark and is several orders of magnitude faster than traditional dataframe libraries (cough - πΌ pandas). qsv proudly rides the π»ββοΈ Polars bear to get subsecond response times even with very large datasets!
- qsv v0.130.0 shell completions files are available for download here. With shell completions, pressing tab in a compatible shell provides suggestions for various qsv commands, subcommands, and options that you can choose from. Supported shells include bash, zsh, powershell, fish, nushell, fig, and elvish. View tips on how to install completions for the bash shell here.
Added
apply: add base62 encode/decode operations #2013headers: add--just-countoption #2004json: add--selectoption #1990searchset: add--not-oneflag by @rzmk in #1994- Added
.ssv(semicolon separated values) automatic support #1987 - Added cargo deb compatibility by @tino097 in #1991
contrib(completions): add--just-countforheadersby @rzmk in #2006contrib(completions): add--selectforjsonby @rzmk in #1992- added several benchmarks
- added more tests
Changed
diff: allow selection of--keyand--sort-columnsby name, not just by index #2010fetch&fetchpost: replace deprecated Redis execute command 75cbe2bstats: more intelligent--infer-lenoption c6a0e64validate: return delimiter detected upon successful CSV validation #1977- bump polars to latest upstream at py-polars-1.3.0 tag #2009
- deps: bump csvs_convert from 0.8.12 to 0.8.13 d1d0800
- build(deps): bump cached from 0.52.0 to 0.53.0 by @dependabot in #1983
- build(deps): bump cached from 0.53.0 to 0.53.1 by @dependabot in #1986
- build(deps): bump postgres from 0.19.7 to 0.19.8 by @dependabot in #1985
- build(deps): bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in #1979
- build(deps): bump redis from 0.25.4 to 0.26.0 by @dependabot in #1995
- build(deps): bump serde_json from 1.0.120 to 1.0.121 by @dependabot in #2011
- build(deps): bump simple-expand-tilde from 0.1.7 to 0.4.0 by @dependabot in #1984
- build(deps): bump tokio from 1.38.0 to 1.38.1 by @dependabot in #1973
- build(deps): bump tokio from 1.38.1 to 1.39.1 by @dependabot in #1988
- build(deps): bump xxhash-rust from 0.8.11 to 0.8.12 by @dependabot in #1997
- apply select clippy suggestions
- updated several indirect dependencies
- made various usage text improvements
- pin Rust nightly to 2024-07-26
Fixed
diff: clarify--keyusage examples, resolves #1998 by @rzmk in #2001json: refactored so it didn't need to use threads to spawnqsv selectto order the columns. Had to do this as sometimes intermediate output was sent to stdout before the final output was ready 0f25defpy: replace row with col in usage text by @allen-chin in #2008reverse: fix indexed bug #2007validate: properly auto-detect tab delimiter when file extension is TSV or TAB #1975- fix panic when process_input helper fn receives unexpected input from stdin 152fec4
Removed
New Contributors
- @tino097 made their first contribution in #1991
- @allen-chin made their first contribution in #2008
Full Changelog: 0.129.1...0.130.0
To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter
0.129.1
This is a small patch release to fix some publishing issues, update tab completion, and to fix minor CI errors.
See 0.129.0 release notes to get the details on qsv's biggest release to date!
Changed
clipboard: add error handling based onclipboard::Errorby @rzmk in #1970contrib(completions): add all commands (exceptapplydp&generate) by @rzmk in #1971- Temporarily suppressed some CI tests that were flaky on GH macOS Apple Silicon action runners. They previously worked fine on self-hosted macOS Apple Silicon action runners that are temporarily unavailable.
Full Changelog: 0.129.0...0.129.1
0.129.0
This release is the biggest one ever!
Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:
π Highlights (click each dropdown for more info)
Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!
@rzmk has contributed to projects in the qsv ecosystem including qsv's describegpt, prompt, json, and clipboard commands; qsv's tab completion support; qsv.dathere.com including its online configurator and benchmarks page; 100.dathere.com with its qsv lessons and exercises; and qsv pro the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv!
With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "automagical" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.
Polars v0.41.3 - numerous sqlp and joinp improvements
sqlp: expanded SQL support- Natural Join support
- DuckDB-like
COLUMNSSQL function to select columns that match a pattern - ORDER BY ALL support
- Support POSTGRESQL
^@("starts with"),~~,~~*,!~~,!~~*("like", "ilike") string-matching operators - Support for SQL
SELECT * ILIKEwildcard syntax - Support SQL temporal functions
STRFTIMEandSTRPTIME
sqlp: added--streamingoption
New command qsv prompt - Use a file dialog for qsv file input and output
Be more interactive with qsv by using a file dialog to select a file for input and output.
Here are a few key highlights:
- Start with
qsv promptwhen piping commands to provide a file as input from an open file dialog and pipe it into another command, for example:qsv prompt | qsv stats. - End with
qsv prompt -fwhen piping commands to save the output to a file you choose with a save file dialog.
There are other options too, so feel free to explore more with qsv prompt --help.
This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!
New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter
The new json command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the --jaq option to provide a jq-like filter. See qsv json --help for more information and examples.
Here are a few key highlights:
- Specify the path to a JSON file to attempt conversion to CSV with
qsv json <filepath>. - Attempt conversion of JSON to CSV data from
stdin, for example:qsv slice <filepath.csv> --json | qsv json. - Write the output to a file with the
--output <filepath>(or-ofor short) option. - Use the
--jaq <filter>option to try converting nested or complex JSON data into the intended format before parsing to CSV.
You may learn more by running qsv json --help.
Along with the jsonl command, we now have more options to convert JSON to CSV with qsv!
New command qsv clipboard - Provide input from your clipboard and save output to your clipboard
Provide your clipboard content using qsv clipboard and save output to your clipboard by piping into qsv clipboard --save (or -s for short).
100.dathere.com - Try out lessons and exercises with qsv from your browser!
You may run qsv commands from your browser without having to install it locally at 100.dathere.com.
| Within the lesson (in-page) using Thebe | In a Jupyter Lab environment |
|---|---|
![]() |
![]() |
Thanks to Jupyter Book, datHere has released a website available at 100.dathere.com where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting 100.dathere.com and star the source code's repository here.
New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)
There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the examples folder from contrib/completions to verify if the examples work (as of today's release date only qsv count and qsv clipboard may be available) and also contribute to adding the rest of the completions if you know a bit of Rust.
The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed.
| Bash completions demo | Fish completions demo |
|---|---|
![]() |
![]() |
With shell completions enabled, you may identify qsv commands more easily when pressing the tab key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com here to learn how to install the Bash completions and under the Usage section here for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.
qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow
This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the create_dataset permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.
qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance
This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
Also note that this video is sped up as you may see by...














