Filtering short contig lenghts before annotation#128
Conversation
|
Hi @yykaya Thank you for the PR. This is very useful. Let's work together to get this merged. A couple of items to tick off before we merge, though.
|
| } | ||
|
|
||
| // Validation for the min_contig_length parameter | ||
| process { |
There was a problem hiding this comment.
This is quite clever. However, we are using nf-schema for parameter validation which means that the parameter type and constraints are defined in a schema file and the plugin automatically validates all the parameters. The schema file is here: https://github.com/Plant-Food-Research-Open/genepal/blob/dev/nextflow_schema.json
This schema can be automatically generated and refined through a web-based GUI. Please see the nf-core docs: https://nf-co.re/docs/nf-core-tools/pipelines/schema
| orthofinder_annotations = null | ||
| outdir = null | ||
| email = null | ||
| min_contig_length = 5000 |
There was a problem hiding this comment.
Should we move this to the // Annotation options section of the config?
| // WORKFLOW: Run main workflow | ||
| // Filter genome assembly by minimum contig length | ||
| // | ||
| SEQKIT_GET_LENGTH(PIPELINE_INITIALISATION.out.target_assembly) |
There was a problem hiding this comment.
SEQKIT_GET_LENGTH should be part of the GENEPAL workflow defined in workflows/genepal.nf file. This structure is also inherited from the nf-core template and allows creating of meta-pipelines where two are more pipelines can be joined into a larger single pipeline.
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| */ | ||
|
|
||
| process SEQKIT_GET_LENGTH { |
There was a problem hiding this comment.
Can we use existing nf-core modules instead of a custom local module?
Nonetheless, custom local modules should be placed in the modules/local/ directory.
| - `fasta:` fasta file for the genome | ||
| - `is_masked`: yes or no to denote whether the fasta file is already masked or not | ||
|
|
||
| #### `--min_contig_length` |
There was a problem hiding this comment.
Parameter documentation is auto generated with the following command,
nf-core -v pipelines schema docs > docs/parameters.mdThe parameters are documented in the docs/parameters.md file.
|
We use pre-commit to automatically fix code linting issues. You can enable pre-commit by doing, pip install pre-commit
cd genepal
pre-commit install
git add -A
pre-commit run --all-files |
|
nf-core linting is failing (https://github.com/Plant-Food-Research-Open/genepal/actions/runs/12314789807/job/34373180357?pr=128) because the new parameter has not been added to the pipeline schema. You can add it by doing, nf-core -v pipelines schema build |
|
This PR has been tagged as awaiting-changes or awaiting-feedback by an nf-core contributor. Remove stale label or add a comment if it is still useful. |
I've added a filtering step for input assemblies to avoid poor downstream analysis and misleading interpretations of gene variations in each contigs. It works well in local computer but not tested in hpc yet.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).