Description of feature
Currently, nf-core/references builds genome indices for splice-aware aligners (STAR, HISAT2, etc.) using mostly fixed or default parameters.
However, some of these parameters should ideally depend on species-specific genome architecture, especially for large or compact genomes where exon/intron structure varies significantly.
Motivation
When building references across multiple species (e.g., mammals vs. insects vs. plants), the same hardcoded STAR parameters can lead to suboptimal or even invalid splice junction indexes.
Allowing per-species or per-asset parameterization (e.g. via YAML keys in assets.yaml or a separate JSON schema) would make the pipeline far more general and biologically robust.
Proposed implementation
Extend the asset schema to include a params: section, e.g.:
genomes:
- id: Homo_sapiens.GRCh38
fasta: path/to/genome.fa
gtf: path/to/annotation.gtf
params:
star:
sjdbOverhang: 99
genomeSAindexNbases: 14
notes:
avg_exon_length: 170
- id: Drosophila_melanogaster.BDGP6
fasta: path/to/genome.fa
gtf: path/to/annotation.gtf
params:
star:
sjdbOverhang: 74
genomeSAindexNbases: 11
notes:
avg_exon_length: 280
Expose these through the pipeline as ext.args or --star_* overrides in the relevant modules.
Existing --kallisto_make_unique flag shows how such params can be exposed consistently.
Description of feature
Currently, nf-core/references builds genome indices for splice-aware aligners (STAR, HISAT2, etc.) using mostly fixed or default parameters.
However, some of these parameters should ideally depend on species-specific genome architecture, especially for large or compact genomes where exon/intron structure varies significantly.
Motivation
When building references across multiple species (e.g., mammals vs. insects vs. plants), the same hardcoded STAR parameters can lead to suboptimal or even invalid splice junction indexes.
Allowing per-species or per-asset parameterization (e.g. via YAML keys in
assets.yamlor a separate JSON schema) would make the pipeline far more general and biologically robust.Proposed implementation
Extend the asset schema to include a params: section, e.g.:
Expose these through the pipeline as
ext.argsor--star_*overrides in the relevant modules.Existing --kallisto_make_unique flag shows how such params can be exposed consistently.