feat: Template#26
Conversation
025bbe6 to
60bdaea
Compare
nh13
left a comment
There was a problem hiding this comment.
Minor comments and suggestions based on looking at what we've done in fgpyo/fgbio (not they are perfect).
| /// Secondary alignments for R2. | ||
| pub r2_secondaries: Vec<Record>, | ||
| /// Supplementary alignments for R1. | ||
| pub r1_supplementaries: Vec<Record>, |
There was a problem hiding this comment.
suggestion: can we use supplementals like we do in fgbio/fgpyo?
There was a problem hiding this comment.
The SAM specification uses "supplementary"; I think the use of "supplemental" is specific to these two packages. Would you be open to using supplementaries, since that's consistent with spec? (I always typo the fgpyo methods because I think of supplementary alignments.)
https://samtools.github.io/hts-specs/SAMv1.pdf
If you agree, I can open issues (and PRs) on fgpyo and fgbio to rename these methods to use supplementaries (and keep the supplementals helpers, just with deprecation warnings).
| /// | ||
| /// # Examples | ||
| /// | ||
| /// ```ignore |
There was a problem hiding this comment.
question: why not make these runnable? Consider adding a test helper module or using no_run if the issue is just runtime dependencies.
Add a method to retrieve the query name from a Template. Returns the query name from the first available record, checking in order: r1, r2, r1_secondaries, r2_secondaries, r1_supplementaries, r2_supplementaries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add an iterator method that yields all records in the template. Records are yielded in order: r1, r2, r1_secondaries, r2_secondaries, r1_supplementaries, r2_supplementaries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add an iterator method that yields only the primary records (r1 and r2). This is a convenience method for iterating over just the primary alignments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the bam module to the public API, making Template and TemplateError accessible to users of the crate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test module with tests for: - Building template from paired-end primaries - Template::name() returning query name - Template::all_recs() iterator - Template::primary_recs() iterator - R1-only and R2-only templates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests verifying that unpaired reads (without PAIRED flag) are correctly placed in r1 following the fgbio convention. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests verifying: - Secondary R1/R2 alignments are placed correctly - Supplementary R1/R2 alignments are placed correctly - Records with both secondary AND supplementary flags go to secondaries (per fgbio convention: secondary flag is checked first) - Multiple secondaries and supplementaries are handled correctly - all_recs() includes all alignment types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests verifying that Template::build() returns appropriate errors: - MismatchedQueryNames when records have different query names - MultiplePrimaryR1 when multiple primary R1 records are found - MultiplePrimaryR2 when multiple primary R2 records are found - Multiple unpaired reads correctly trigger MultiplePrimaryR1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests for edge cases and special scenarios: - Empty template (no records) - Template::name() from secondary-only records - Template::name() from supplementary-only records - Default template via Template::default() - Verification of all_recs() ordering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Accept any IntoIterator instead of requiring Vec<Record>. This allows callers to pass iterators directly without collecting to Vec first.
Leverage the existing all_recs() iterator which already has the correct priority order, instead of chaining multiple or_else calls.
Add standard traits to TemplateError for easier testing and error handling. Clone is useful for retrying operations, and PartialEq/Eq enable direct equality assertions in tests.
Replace match blocks with direct assert_eq! calls now that TemplateError derives PartialEq.
Follow Rust conventions for the primary constructor method. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Incorporates Nils's review thread (accepted items) plus findings from a /simplify + code-review pass. From Nils: - Derive Clone on Template - Add EmptyTemplate error; new(vec![]) now errors - Store qnames as Vec<u8> in TemplateError (preserves non-UTF-8 bytes) - Add all_r1s(), all_r2s(), len(), is_empty() - Reorder all_recs() to match fgbio Scala: r1, r2, r1_supp, r2_supp, r1_sec, r2_sec - Add new_unchecked() for hot paths where invariants are guaranteed - Doctests are runnable (no more `ignore`) From the review pass: - Drop per-record Vec<u8> allocation in new() hot path; hoist the String allocation out of the success path - Share a private push() helper between new() and new_unchecked() to remove duplicated classification dispatch - Flatten the 3-level if-cascade into a single match on (Bucket, is_r1) - Short-circuit name() with an or_else chain - Document that direct field mutation can void invariants - Add tests for: R1+R2 supplementaries with both primaries, LAST_IN_PAIR without PAIRED, non-UTF-8 qnames in error round-trip, len/is_empty/all_r1s/all_r2s/Clone Open: supplementaries vs supplementals naming pending Nils's reply; TryFrom<IntoIterator> deferred (new() already takes IntoIterator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rust-htslib 0.51 uses `<u{N}>::is_multiple_of`, stabilized in Rust 1.87.
Without this bump the MSRV CI job fails on `cargo check`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cite the fgbio (Scala) and fgpyo (Python) Template implementations the Rust port was modeled on, with verified line ranges at pinned commits so future readers can compare against the exact source we ported from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Part of #20
Adds the
Templatetype for grouping BAM alignments by query name — the first BAM-related module in fgoxide. Modeled directly on the existingTemplateimplementations infgbio(Scala) andfgpyo(Python); the module rustdoc cites both upstream sources at pinned commits.API
Template::new(records)— validating constructorTemplate::new_unchecked(records)— fast path for callers with pre-grouped input (e.g. records from a query-name-grouped iterator); mirrorsfgpyo'sbuild(validate=False)name,all_recs,primary_recs,all_r1s,all_r2s,len,is_emptyTemplateErrorcarriesVec<u8>qnames so non-UTF-8 names round-trip losslesslyall_recsorder followsfgbio's Scala (r1, r2, r1_supp, r2_supp, r1_sec, r2_sec);fgpyo's interleaved order is acknowledged inlineDependencies / MSRV
Adds
rust-htslib 0.51, which uses<u{N}>::is_multiple_of(stable in Rust 1.87). MSRV bumped to 1.87 accordingly.thiserror 2.xcame in via the recent main upgrade.Review iteration
Incorporates @nh13's suggestions:
new_unchecked,EmptyTemplatevariant,Vec<u8>qnames,Clone,all_r1s/all_r2s/len,fgbioiteration order, and runnable doctests. Naming open:supplementaries(matches the SAM spec) is currently in tree; awaiting confirmation that this is preferred oversupplementals. @theJasonFan'sTryFrom<IntoIterator>suggestion is held for now sincenew()already acceptsIntoIteratordirectly.Test plan
97 unit tests + 6 doctests covering paired and single-end inputs, all four secondary/supplementary buckets,
fgbio's "secondary AND supplementary → secondaries" precedence, every error variant including non-UTF-8 qnames,Default/Clone, theLAST_IN_PAIR-without-PAIREDedge case, R1+R2 supplementaries with both primaries, andnew_unchecked's documented "last primary wins" overwrite semantics.