🎉 qsv pro is now available in the Microsoft Store! 🎉

It's Data Wrangling Democratized on the Desktop, featuring:

📊 Familiar Spreadsheet Interface
tap the power of qsv to query, analyze, enrich, scrub and transform huge Excel files and multi-gigabyte CSV files in seconds, without having to deal with the command-line.
CKAN desktop client
designed to make data publishing easier for portal operators and data stewards using the CKAN platform.
📥 Flow
allows you to build custom node-based flows and data pipelines using a visual interface.
🔧 Toolbox
features an ever-expanding library of reusable scripts for common data-wrangling use cases.
⭐ and more!
Natural Language Interface (RAG), Polars SQL query support, an API, Python/Luau support, automatic Data Dictionaries, DCAT 3 metadata profile inferencing, along with a retinue of other cloud-based services (e.g. customizable street-level geocoding, data feeds, reference data lookups, geo-ip lookups, cloud storage support, .qsv file format, etc.) that will be unveiled in future versions.

Like qsv, we're iterating rapidly with qsv pro, so your feedback is essential. Give it a try!

Get it from https://qsvpro.dathere.com or

Other highlights:

excel: new --table option for XLSX files; new --header-row option; expanded --range option, adding support for Named Ranges and absolute ranges (e.g. Sheet2!$A$1:$J$10); and expanded metadata export now including Named Ranges and Tables (for XLSX files)
Improved performance for several commands (apply, datefmt, tojsonl and validate) through automatic batch size optimization
validate: dynamicEnum custom JSON Schema keyword in validate command (renamed from dynenum) and enhanced email validation
schema: automatic JSON Schema const inferencing for columns with just one value
Significant dependency updates, including latest upstream versions of Polars, jsonschema, and serde_json with unreleased performance upgrades, new features and fixes

NOTE: You can see qsv & qsv pro in action in our "The Problem with Data Portals" webinar Wed, Oct 23, 2024. 1-2pm EDT

Added

🎉 qsv pro is now in the Microsoft Store!!! 🎉
apply, datefmt, tojsonl, validate: added logic to automatically determine optimal batch size for better parallelization #2178
enum: added --new-column support for all enum modes, not just --increment #2173
excel: new --table option for XLSX files #2194
excel: new --header-row option 458f79a
excel: expanded range and metadata options #2195
schema: added JSON Schema automatic const inferencing #2180
Add signing step to qsv MSI installer GitHub Action by @rzmk in #2182
contrib(completions): add --table option to qsv excel by @rzmk in #2197
completions: add --header-row option to qsv excel e8794d5
added new apply operations sentiment benchmark b745e64
docs: added indexing section to PERFORMANCE.md 804145a

Changed

stats: various minor micro-optimizations 62d95fc 2c2862a
validate: renamed custom keyword dynenum to dynamicEnum to be more consistent with JSON schema naming conventions 0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf
validate: optimizations for increased performance; replace serde_json with simd_json 0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf
apply new clippy::ref_option lint to Config::new API #2192
Update debian package readme by @tino097 in #2187
deps: bump calamine from 0.25 to 0.26 b42279a
deps: jsonschema use latest 0.22.3 upstream with unreleased features/fixes
deps: polars use latest 0.43.1 upstream with unreleased features/fixes
deps: created our own fork of unmaintained vader_sentiment crate b426761
deps: use serde_json upstream with unreleased perf improvement/fixes https://github.com/jqnatividad/qsv/blob/1c1174b3b8b65d9dfd9c841597366fb09d0a047c/Cargo.toml#L221
build(deps): bump flate2 from 1.0.33 to 1.0.34 by @dependabot in #2171
build(deps): bump flexi_logger from 0.29.0 to 0.29.1 by @dependabot in #2189
build(deps): bump flexi_logger from 0.29.1 to 0.29.2 by @dependabot in #2196
build(deps): bump hashbrown from 0.14.5 to 0.15.0 by @dependabot in #2186
build(deps): bump jsonschema from 0.20.0 to 0.21.0 by @dependabot in #2177
build(deps): bump jsonschema from 0.22.1 to 0.22.2 by @dependabot in #2191
build(deps): bump regex from 1.10.6 to 1.11.0 by @dependabot in #2176
build(deps): bump reqwest from 0.12.7 to 0.12.8 by @dependabot in #2183
build(deps): bump simd-json from 0.14.0 to 0.14.1 #2199
build(deps): bump simple-expand-tilde from 0.4.2 to 0.4.3 by @dependabot in #2190
build(deps): bump sysinfo from 0.31.4 to 0.32.0 by @dependabot in #2193
build(deps): bump tempfile from 3.12.0 to 3.13.0 by @dependabot in #2175
apply select clippy lints
bumped indirect dependencies
aligned Rust nightly to Polars nightly - 2024-09-29 7cd2de1

Fixed

schema: fix enum so it only adds a list when the number of unique values > --enum-threshold #2180
Upload artifact fix for Debian package publishing by @tino097 in #2168
fixed typos configuration 627de89
fixed various GitHub Actions publishing workflow issues

Full Changelog: 0.135.0...0.136.0

Contributors

tino097, dependabot, and rzmk

Assets 12

24 Sep 12:46

jqnatividad

0.135.0

7b9edaf

0.135.0

Highlights

JSON Schema validation just got a whole lot more powerful with the introduction of qsv's custom dynenum keyword!
With dynenum, you can now dynamically lookup valid enum values from a CSV (on the filesystem or on a URL), allowing for more flexible and responsive data validation.

Unlike the standardenum keyword, dynenum does not require hardcoding valid values at schema definition time, and can be used to validate data against a changing set of valid values.

For an example, see #1872 (reply in thread).

In an upcoming qsv pro release, we're planning on making dynenum even more powerful by allowing you to easily specify high-value reference data (e.g. US Census data, World Bank data, data.gov, etc.) that is maintained at data.dathere.com and other CKAN instances.

This release also add the custom currency JSON Schema format, which enables currency validation according to the ISO 4217 standard.

The Polars engine was also upgraded to 0.43.1 at the py-1.81.1 tag - making for various under-the-hood improvements for the sqlp, joinp and count commands, as we set the stage for more Polars-powered features in future releases.