Fix sample discovery when tumor-only run without pairs file; minor issue with pairs.tsv in test dataset#174
Merged
Merged
Conversation
Member
|
Looks good @samarth8392! Can you just confirm that you tested this branch on biowulf and it works as expected now? |
Contributor
Author
|
Yes, I ran the test run on the test dataset and I can confirm that the bam_check ran without issues |
Contributor
Author
|
This PR also fixed #173 as I have updated the names in the test data pairs.tsv file |
kelly-sovacool
approved these changes
Jan 29, 2026
kelly-sovacool
left a comment
Member
There was a problem hiding this comment.
excellent, thanks for these fixes Samarth!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
XAVIER could fail at Snakefile parse time with:
Fatal: Either a valid pairs file or sample names must be provided.Sample names provided: set()even when valid *.fastq.gz inputs were provided and tumor-only mode should have proceeded.
Root cause
Sample discovery depends on name_symlinks, which is normally created by sym_safe() symlinking discovered inputs into:
input_files/fastq/ (FASTQ mode) or
input_files/bam/ (BAM mode)
However, the Snakefile logic previously did this:
If input_files/fastq existed, it only globbed input_files/fastq/*.fastq.gz and did not run sym_safe() again.
If the directory existed but was empty (e.g., from a partial init/failed run/manual mkdir), then name_symlinks=[] → samples=set() → read_pairsfile() raised the fatal error before any rules executed.
Issues
Harden sample discovery to repopulate symlinks when the input directory exists but contains no files:
Always os.makedirs(input_fqdir, exist_ok=True) / os.makedirs(input_bamdir, exist_ok=True)
Prefer existing symlinks when present
If globbing the directory returns empty, call sym_safe(...) to (re)populate it
Add an early, actionable error message if samples still cannot be inferred
This makes tumor-only runs robust to stale/empty input_files/* directories and prevents parse-time failure.
Fixes #172
Fixes #173
PR Checklist
(
Strikethroughany points that are not applicable.)[ ] Update docs if there are any API changes.CHANGELOG.mdwith a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/