Replace fasterq-dump with sracat-rs#49
Conversation
Use sracat-rs (github.com/wwood/sracat-rs) for all .sra extraction, replacing fasterq-dump in the sorted/default path and the legacy sracat binary in the --unsorted paths. sracat-rs writes split mates via -1/-2/--single-out in native .fasta/.fastq, so the .fna renames and the FIFO/-z gzip machinery are gone (gzip is now a pigz pass). --unsorted --stdout routes single/orphan reads to /dev/stdout so single-end runs work. Empty -1/-2 files (created eagerly by sracat-rs for single-end runs) are dropped before further processing. deps: swap sracat -> sracat-rs in pixi.toml, regenerate pixi.lock and admin/environment.yml. Exclude gawk from the generated requirements.txt. docs/cli: mention sracat-rs instead of fasterq-dump. tests: regenerate the .sra-extraction expectations in test_get_sra_and_aws.py to match sracat-rs output (sorted and unsorted now extract identically, so their expected values converge). All extraction-affected tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ac851eaa4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extern.run("sracat-rs --qual --threads {} -1 {} -2 {} --single-out {} {}".format( | ||
| threads, r1, r2, single, sra_file_abs)) |
There was a problem hiding this comment.
Preserve sorted extraction when --unsorted is absent
For default get/extract calls where users do not pass --unsorted, this now invokes sracat-rs, the same unordered extractor used by the unsorted branch, so the documented/default distinction is lost. The CLI still advertises --unsorted as the opt-in path for arbitrary SRA storage order, and tests were changed so the non-unsorted FASTQ output matches the unsorted sizes; workflows that rely on fasterq-dump's original spot order now silently receive arbitrary-order reads.
Useful? React with 👍 / 👎.
Summary
Migrates all
.sraextraction tosracat-rs, replacing bothfasterq-dump(sorted/default path) and the legacysracatbinary (--unsortedpaths). The legacysracatdependency is dropped entirely.Changes
kingfisher/__init__.py(extract)fasterq-dump→sracat-rs --qual -1/-2/--single-out.--unsortedfile outputs: rewritten tosracat-rssplit mode, writing native.fasta/.fastqdirectly (no more.fnarenames, no FIFO/-zmachinery; gzip is now apigzpass).--unsorted --stdout: routes single/orphan reads to/dev/stdoutso single-end runs work.-1/-2files (created eagerly bysracat-rsfor single-end runs) are dropped before further processing.import gzip.sracat→sracat-rsinpixi.toml; regeneratedpixi.lock(sracat-rs 0.0.3) andadmin/environment.yml. Excludedgawkfrom the generatedrequirements.txt.sracat-rsinstead offasterq-dump..sra-extraction expectations intest_get_sra_and_aws.py. Since sorted and unsorted now run identicalsracat-rsextraction, their previously-distinct expected values converge. ENA/NGDC tests are unaffected (they download fastq.gz directly).Testing
Run on the aqua HPC queue in the
devpixi env:Not run:
aws_cp-marked tests (need AWS credentials) and pure.sra-download / bioproject / prefetch-listing tests (don't touch extraction).🤖 Generated with Claude Code