Skip to content

Filtering short contig lenghts before annotation#128

Open
yykaya wants to merge 5 commits into
Plant-Food-Research-Open:devfrom
yykaya:main
Open

Filtering short contig lenghts before annotation#128
yykaya wants to merge 5 commits into
Plant-Food-Research-Open:devfrom
yykaya:main

Conversation

@yykaya

@yykaya yykaya commented Dec 13, 2024

Copy link
Copy Markdown

I've added a filtering step for input assemblies to avoid poor downstream analysis and misleading interpretations of gene variations in each contigs. It works well in local computer but not tested in hpc yet.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@GallVp

GallVp commented Dec 13, 2024

Copy link
Copy Markdown
Member

Hi @yykaya

Thank you for the PR. This is very useful. Let's work together to get this merged. A couple of items to tick off before we merge, though.

  • We are using the nf-core template. It requires that the main branch is in sync with the latest release. Thus, all new contributions should target the dev branch. I have done this by clicking the edit button at the top and changing the base to dev.
  • Please see the other comments.

@GallVp GallVp changed the base branch from main to dev December 13, 2024 23:20
Comment thread nextflow.config
}

// Validation for the min_contig_length parameter
process {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite clever. However, we are using nf-schema for parameter validation which means that the parameter type and constraints are defined in a schema file and the plugin automatically validates all the parameters. The schema file is here: https://github.com/Plant-Food-Research-Open/genepal/blob/dev/nextflow_schema.json

This schema can be automatically generated and refined through a web-based GUI. Please see the nf-core docs: https://nf-co.re/docs/nf-core-tools/pipelines/schema

Comment thread nextflow.config Outdated
orthofinder_annotations = null
outdir = null
email = null
min_contig_length = 5000

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the // Annotation options section of the config?

Comment thread main.nf
// WORKFLOW: Run main workflow
// Filter genome assembly by minimum contig length
//
SEQKIT_GET_LENGTH(PIPELINE_INITIALISATION.out.target_assembly)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEQKIT_GET_LENGTH should be part of the GENEPAL workflow defined in workflows/genepal.nf file. This structure is also inherited from the nf-core template and allows creating of meta-pipelines where two are more pipelines can be joined into a larger single pipeline.

Comment thread main.nf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process SEQKIT_GET_LENGTH {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use existing nf-core modules instead of a custom local module?

Nonetheless, custom local modules should be placed in the modules/local/ directory.

Comment thread README.md Outdated
- `fasta:` fasta file for the genome
- `is_masked`: yes or no to denote whether the fasta file is already masked or not

#### `--min_contig_length`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter documentation is auto generated with the following command,

nf-core -v pipelines schema docs > docs/parameters.md

The parameters are documented in the docs/parameters.md file.

@GallVp

GallVp commented Dec 13, 2024

Copy link
Copy Markdown
Member

We use pre-commit to automatically fix code linting issues. You can enable pre-commit by doing,

pip install pre-commit

cd genepal
pre-commit install

git add -A
pre-commit run --all-files

@GallVp

GallVp commented Dec 13, 2024

Copy link
Copy Markdown
Member

nf-core linting is failing (https://github.com/Plant-Food-Research-Open/genepal/actions/runs/12314789807/job/34373180357?pr=128) because the new parameter has not been added to the pipeline schema. You can add it by doing,

nf-core -v pipelines schema build

@github-actions

Copy link
Copy Markdown

This PR has been tagged as awaiting-changes or awaiting-feedback by an nf-core contributor. Remove stale label or add a comment if it is still useful.

@github-actions github-actions Bot added the Stale label May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants