Add --compression-level option for VCF/BCF output#298
Open
tfenne wants to merge 1 commit into
Open
Conversation
GLIMPSE2_phase, _ligate, and _concordance previously hardcoded the
htslib output mode to compressed BCF ("wb") / compressed VCF ("wz") with
no way to change the compression level or to emit an uncompressed BCF.
This forced every run to pay full (level-6) compression cost, which is
wasteful for large intermediate files that are immediately re-read by the
next pipeline stage.
Add a --compression-level option (INT, default 6, range 0-9) mirroring
bcftools' -l/--compression-level so the level is familiar to users.
The level is appended to the htslib mode string only for compressed
formats (wb/wz); plain uncompressed VCF (.vcf) is untouched. Level 0
yields a BGZF-stored (uncompressed) BCF, equivalent to `bcftools -l 0`.
The default of 6 matches htslib's implicit default, so output is
byte-identical when the flag is unused.
Help text, startup logging, and the documentation option tables are
updated for all three tools.
Collaborator
|
Thanks! Yes, very easy to add. I should be able to test this very quickly (no need for UKB) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
GLIMPSE2_phase,GLIMPSE2_ligate, andGLIMPSE2_concordancehardcode the htsliboutput mode to compressed BCF (
"wb") / compressed VCF ("wz"), so there is no wayto change the compression level or to emit an uncompressed BCF. Every run pays full
(level-6) compression cost, which is wasteful for large intermediate files that the
next pipeline stage immediately re-reads and decompresses.
@srubinacci this is a purely mechanical change in the output compression/writing layer, so hopefully can be reviewed (and merged) even without the ability to benchmark on UKB?
Change
Adds a
--compression-leveloption (INT, default 6, range 0–9) to all three tools,mirroring
bcftools -l/--compression-levelso the semantics are familiar:(
wb/wz); plain uncompressed VCF (.vcf) is left untouched.indexable, just not deflated — equivalent to
bcftools -l 0.when the flag is unused.
--bgen-compr)_rej_sites.bcf/_conc_sites.bcf/_disc_sites.bcfdiagnostic outputsHelp text, startup logging, and the per-tool documentation tables are updated for all
three tools.
Testing
--compression-level 12) are rejected with a clear error.bcftools -Ob0; higher levels compress;decoded records are identical across levels.
master(matching MD5).bcftools index -s).