This guide provides comprehensive examples of how to run NVD2 for different use cases. Each example focuses on specific scenarios that researchers commonly encounter, with explanations of when and why you'd use each configuration.
For basic setup instructions, see the main README.md file.
STAT+BLAST Workflow Aliases: The human virus detection workflow (STAT + two-phase BLAST)
can be invoked using any of these tool names: stat_blast, nvd, stat, blast, or stast.
They all trigger the same workflow. The legacy name nvd is maintained for backward compatibility.
Examples in this guide use various aliases to demonstrate flexibility, but they're all equivalent.
- Quick Start Examples
- Deployment Profiles
- Platform-Specific Configurations
- Quality and Sensitivity Adjustments
- Data Sources and Input Types
- Output and Results Management
- Privacy and Compliance
- Enterprise Integration (LabKey LIMS)
- Specialized Use Cases
- Troubleshooting and Debugging
- Configuration Files and Parameters
- Useful Nextflow Options
Use this when you're specifically interested in detecting human viruses and want the fastest, most focused analysis:
Note: Tool aliases stat_blast, nvd, stat, blast, and stast all invoke
the same workflow. Use whichever name is most intuitive for your team.
nextflow run dhoconno/nvd \
--tools stat_blast \
--samplesheet assets/example_samplesheet.csv \
--gottcha2_db db/PP819512_gottcha2.fasta \
--blast_db db \
--blast_db_prefix PP819512-nt \
--stat_index db/tree_index.dense.dbs \
--stat_dbss db/tree_index.dense.dbss \
--stat_annotation db/tree_index.dense.dbss.annotation \
--human_virus_taxlist db/human_viruses_taxlist.txt \
--experiment_id 12345When to use: Wastewater surveillance, clinical samples where you specifically need human virus detection, outbreak investigations focusing on viral pathogens.
Use this for broad taxonomic profiling when you want to see everything in your sample:
nextflow run dhoconno/nvd \
--tools gottcha \
--samplesheet assets/example_samplesheet.csv \
--gottcha2_db db/PP819512_gottcha2.fasta \
--experiment_id 12346When to use: Environmental samples, microbiome studies, exploratory analysis of unknown samples, quality control to see overall sample composition.
Run everything for comprehensive analysis - human virus detection, general taxonomic profiling, and data deduplication:
nextflow run dhoconno/nvd \
--tools all \
--samplesheet assets/example_samplesheet.csv \
--gottcha2_db db/PP819512_gottcha2.fasta \
--blast_db db \
--blast_db_prefix PP819512-nt \
--stat_index db/tree_index.dense.dbs \
--stat_dbss db/tree_index.dense.dbss \
--stat_annotation db/tree_index.dense.dbss.annotation \
--human_virus_taxlist db/human_viruses_taxlist.txt \
--experiment_id 12347When to use: Research projects where you want comprehensive analysis, when compute resources aren't limiting, for samples where you're unsure what you might find.
Best for development or when you have all tools installed locally:
nextflow run dhoconno/nvd \
-profile local \
--tools nvd \
--samplesheet my_samples.csv \
--blast_db /path/to/blast_db \
--blast_db_prefix core_nt \
--stat_index /path/to/STAT_db/tree_index.20240830.dbs \
--stat_dbss /path/to/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /path/to/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /path/to/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12348When to use: Development environments, when you've manually installed all dependencies, debugging tool-specific issues.
Provides consistent environment across different systems:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12349When to use: Local workstations, development environments, when you want reproducible results without manual tool installation.
Ideal for high-performance computing clusters that don't support Docker:
nextflow run dhoconno/nvd \
-profile apptainer \
--tools nvd,gottcha \
--samplesheet large_dataset.csv \
--gottcha2_db /shared/gottcha2/gottcha_db.species.fna \
--blast_db /shared/blast_db \
--blast_db_prefix core_nt \
--stat_index /shared/STAT_db/tree_index.20240830.dbs \
--stat_dbss /shared/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /shared/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /shared/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12350When to use: University HPC clusters, government computing facilities, any environment where Docker is restricted but Singularity/Apptainer is available.
Configured specifically for Center for High Throughput Computing environments:
nextflow run dhoconno/nvd \
-profile chtc_hpc \
--tools all \
--samplesheet wastewater_samples.csv \
--gottcha2_db /staging/gottcha2/gottcha_db.species.fna \
--blast_db /staging/blast_db \
--blast_db_prefix core_nt \
--stat_index /staging/STAT_db/tree_index.20240830.dbs \
--stat_dbss /staging/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /staging/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /staging/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12351 \
--max_concurrent_downloads 5When to use: CHTC infrastructure, SLURM-based clusters with shared storage, when you need high parallelization for large datasets.
Optimized parameters for high-quality short-read Illumina data:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd,gottcha \
--samplesheet illumina_wastewater.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--min_gottcha_reads 500 \
--min_consecutive_bases 150 \
--experiment_id 12352Key parameters:
--min_gottcha_reads 500: Higher threshold for reliable classification on short reads--min_consecutive_bases 150: Optimized for typical Illumina read lengths
When to use: Wastewater surveillance with Illumina sequencing, clinical samples, any high-quality short-read dataset.
Optimized for longer, more error-prone Nanopore reads:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd,gottcha \
--samplesheet nanopore_air_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--min_gottcha_reads 100 \
--min_consecutive_bases 300 \
--qtrim 'f' \
--experiment_id 12353Key parameters:
--min_gottcha_reads 100: Lower threshold appropriate for longer reads--min_consecutive_bases 300: Takes advantage of longer Nanopore reads--qtrim 'f': Disables quality trimming (less critical for Nanopore)
When to use: Air sampling studies, environmental surveillance with portable sequencers, when you need longer reads for better assembly.
Balanced parameters for datasets containing both Illumina and Nanopore data:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet mixed_platform_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--min_gottcha_reads 250 \
--experiment_id 12354When to use: Comparative studies using multiple sequencing platforms, when combining historical and new data from different instruments.
Maximize detection of low-abundance organisms, accepting higher false positive risk:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet low_biomass_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--cutoff_percent 0.0001 \
--tax_stringency 0.5 \
--entropy 0.7 \
--min_consecutive_bases 100 \
--min_gottcha_reads 50 \
--experiment_id 12355Key parameters:
--cutoff_percent 0.0001: Very low threshold for rare organisms--tax_stringency 0.5: Lower confidence requirements--entropy 0.7: Allow more repetitive sequences--min_consecutive_bases 100: Shorter minimum sequences
When to use: Low-biomass samples, early outbreak detection, environmental surveillance where you don't want to miss anything.
Minimize false positives for high-confidence results:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd,gottcha \
--samplesheet high_biomass_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--cutoff_percent 0.01 \
--tax_stringency 0.9 \
--entropy 0.95 \
--min_consecutive_bases 500 \
--min_gottcha_reads 1000 \
--experiment_id 12356Key parameters:
--cutoff_percent 0.01: Higher threshold for confident detection--tax_stringency 0.9: Require high confidence for classification--entropy 0.95: Exclude repetitive/low-complexity regions--min_consecutive_bases 500: Require longer, more reliable sequences
When to use: Clinical diagnostics, regulatory reporting, when false positives have serious consequences.
Focus analysis on specific virus families of interest:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet respiratory_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12357 \
-params-file custom_virus_families.jsonCreate custom_virus_families.json for respiratory virus surveillance:
{
"human_virus_families": [
"Coronaviridae",
"Orthomyxoviridae",
"Paramyxoviridae",
"Pneumoviridae",
"Picornaviridae"
]
}When to use: Targeted surveillance (respiratory, enteric, etc.), when you want to focus computational resources on specific virus types.
Process locally stored sequencing data:
nextflow run dhoconno/nvd \
-profile docker \
--tools gottcha \
--samplesheet local_fastq_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--experiment_id 12358Sample local_fastq_samples.csv:
sample_id,srr,platform,fastq1,fastq2
sample1,,illumina,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz
sample2,,ont,/path/to/sample2.fastq.gz,When to use: Processing local sequencing runs, private data that cannot be shared publicly, when you have fast local storage.
Process public datasets from NCBI's Sequence Read Archive:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet sra_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--max_concurrent_downloads 2 \
--experiment_id 12359Sample sra_samples.csv:
sample_id,srr,platform,fastq1,fastq2
wastewater_1,SRR12345678,,,
wastewater_2,SRR12345679,,,
wastewater_3,SRR12345680,,,Key parameters:
--max_concurrent_downloads 2: Respect NCBI rate limits
When to use: Meta-analyses of public data, comparative studies, when reproducing published results.
Combine your own data with public datasets for comparison:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet mixed_sources.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--max_concurrent_downloads 3 \
--experiment_id 12360Sample mixed_sources.csv:
sample_id,srr,platform,fastq1,fastq2
local_sample1,,illumina,/data/local1_R1.fastq.gz,/data/local1_R2.fastq.gz
sra_sample1,SRR12345678,,,
local_sample2,,ont,/data/local2.fastq.gz,
sra_sample2,SRR12345679,,,When to use: Benchmarking your samples against public data, temporal studies combining historical and new data.
Organize results in a specific location:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet my_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--results /project/nvd_analysis_2024 \
--experiment_id 12361When to use: Project-specific organization, shared storage systems, when you need results in a particular location for downstream analysis.
Save disk space by removing intermediate files after successful completion:
nextflow run dhoconno/nvd \
-profile docker \
--tools gottcha \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--cleanup true \
--experiment_id 12362When to use: Production runs with limited disk space, when you only need final results, routine surveillance processing.
Restart from the last successful checkpoint:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--cleanup false \
--experiment_id 12363 \
-resumeWhen to use: After system failures, when debugging pipeline issues, for long-running analyses that might be interrupted.
Remove human genetic material before sharing data publicly:
nextflow run dhoconno/nvd \
-profile docker \
--tools clumpify \
--samplesheet sensitive_samples.csv \
--human_read_scrub true \
--human_filter_db /data/human_filter_db \
--experiment_id 12364When to use: Before uploading to SRA, sharing with external collaborators, when working with clinical samples that may contain human DNA.
Optimize data for long-term storage while preserving analysis capability:
nextflow run dhoconno/nvd \
-profile docker \
--tools clumpify,gottcha \
--samplesheet large_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--human_read_scrub false \
--experiment_id 12365When to use: Archiving large datasets, when storage costs are a concern, for datasets you may want to reanalyze later.
Upload results directly to LabKey Laboratory Information Management System:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet my_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12366 \
-params-file labkey_config.jsonExample labkey_config.json:
{
"labkey": true,
"labkey_server": "https://your-labkey-server.org",
"labkey_project_name": "YourProject",
"labkey_schema": "lists",
"labkey_webdav": "https://your-labkey-server.org/_webdav/YourProject/@files/",
"labkey_blast_meta_hits_list": "nvd_blast_hits",
"labkey_blast_fasta_list": "nvd_fasta_sequences"
}When to use: Laboratory workflows, regulatory compliance, when you need structured data management and audit trails.
Complete integration with all workflows and LabKey systems:
nextflow run dhoconno/nvd \
-profile apptainer \
--tools all \
--samplesheet enterprise_samples.csv \
--gottcha2_db /shared/gottcha2/gottcha_db.species.fna \
--blast_db /shared/blast_db \
--blast_db_prefix core_nt \
--stat_index /shared/STAT_db/tree_index.20240830.dbs \
--stat_dbss /shared/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /shared/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /shared/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12368 \
-params-file full_enterprise_config.jsonWhen to use: Production surveillance systems, institutional deployments, when you need comprehensive data management.
Optimized for routine wastewater monitoring programs:
nextflow run dhoconno/nvd \
-profile chtc_hpc \
--tools nvd,gottcha \
--samplesheet wastewater_surveillance.csv \
--gottcha2_db /staging/gottcha2/gottcha_db.species.fna \
--blast_db /staging/blast_db \
--blast_db_prefix core_nt \
--stat_index /staging/STAT_db/tree_index.20240830.dbs \
--stat_dbss /staging/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /staging/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /staging/STAT_db/human_viruses_taxlist.txt \
--cutoff_percent 0.001 \
--tax_stringency 0.7 \
--min_gottcha_reads 100 \
--max_concurrent_downloads 10 \
--experiment_id 20240001 \
-params-file wastewater_labkey.jsonKey features: Balanced sensitivity, high throughput, enterprise integration for public health reporting.
High-specificity analysis for diagnostic applications:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet clinical_samples.csv \
--blast_db /data/clinical_blast_db \
--blast_db_prefix clinical_nt \
--stat_index /data/clinical_STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/clinical_STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/clinical_STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/clinical_STAT_db/human_viruses_taxlist.txt \
--cutoff_percent 0.01 \
--tax_stringency 0.9 \
--entropy 0.95 \
--min_consecutive_bases 300 \
--experiment_id 20240002Key features: High specificity parameters, clinical-grade databases, minimal false positives.
Fast turnaround for emergency response:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet outbreak_samples.csv \
--blast_db /data/outbreak_blast_db \
--blast_db_prefix outbreak_nt \
--stat_index /data/outbreak_STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/outbreak_STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/outbreak_STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/outbreak_STAT_db/human_viruses_taxlist.txt \
--cutoff_percent 0.0001 \
--tax_stringency 0.5 \
--max_concurrent_downloads 8 \
--cleanup false \
--experiment_id 20240004 \
-params-file outbreak_labkey.jsonKey features: High sensitivity, fast processing, preserved intermediate files for follow-up analysis.
Generate detailed reports for troubleshooting:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet debug_samples.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--cleanup false \
--experiment_id 99999 \
-with-report debug_report.html \
-with-trace debug_trace.txt \
-with-timeline debug_timeline.html \
-with-dag debug_dag.pngGenerates:
debug_report.html: Execution summary and resource usagedebug_trace.txt: Detailed process execution logdebug_timeline.html: Visual timeline of pipeline executiondebug_dag.png: Workflow diagram
When to use: When troubleshooting failures, optimizing performance, understanding resource requirements.
Test pipeline with minimal data for quick validation:
nextflow run dhoconno/nvd \
-profile docker \
--tools nvd \
--samplesheet single_sample.csv \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 99998Single sample CSV example:
sample_id,srr,platform,fastq1,fastq2
test_sample,,illumina,test_R1.fastq.gz,test_R2.fastq.gzWhen to use: Testing new configurations, validating database setups, quick pipeline checks.
Check configuration and workflow structure without running analysis:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 99997 \
-previewWhen to use: Validating complex configurations, checking parameter compatibility, planning resource allocation.
Create reusable configurations for your environment:
nextflow run dhoconno/nvd \
-c custom_nvd.config \
--tools all \
--samplesheet my_samples.csv \
--experiment_id 12369Example custom_nvd.config:
params {
gottcha2_db = "/shared/databases/gottcha2/gottcha_db.species.fna"
blast_db = "/shared/databases/blast_db"
blast_db_prefix = "core_nt"
stat_index = "/shared/databases/STAT_db/tree_index.20240830.dbs"
stat_dbss = "/shared/databases/STAT_db/tree_filter.20240830.dbss"
stat_annotation = "/shared/databases/STAT_db/tree_filter.20240830.dbss.annotation"
human_virus_taxlist = "/shared/databases/STAT_db/human_viruses_taxlist.txt"
results = "/project/results"
labkey = true
labkey_server = "https://company-labkey.org"
labkey_project_name = "ProjectName"
}
profiles {
company_hpc {
apptainer.enabled = true
process.executor = 'slurm'
process.queue = 'high-memory'
process.cpus = 32
process.memory = 128.GB
}
}Store all parameters in a JSON file for reproducibility:
nextflow run dhoconno/nvd \
-profile docker \
-params-file complete_params.jsonExample complete_params.json:
{
"tools": "all",
"samplesheet": "comprehensive_samples.csv",
"experiment_id": 12370,
"gottcha2_db": "/data/gottcha2/gottcha_db.species.fna",
"blast_db": "/data/blast_db",
"blast_db_prefix": "core_nt",
"stat_index": "/data/STAT_db/tree_index.20240830.dbs",
"stat_dbss": "/data/STAT_db/tree_filter.20240830.dbss",
"stat_annotation": "/data/STAT_db/tree_filter.20240830.dbss.annotation",
"human_virus_taxlist": "/data/STAT_db/human_viruses_taxlist.txt",
"cutoff_percent": 0.001,
"tax_stringency": 0.8,
"entropy": 0.9,
"min_consecutive_bases": 200,
"min_gottcha_reads": 250,
"max_concurrent_downloads": 5,
"results": "/project/nvd_results",
"cleanup": true,
"labkey": true,
"labkey_server": "https://labkey.example.org",
"labkey_project_name": "MetagenomicsSurveillance"
}When to use: Reproducible research, sharing configurations, complex enterprise setups.
Control computational resource consumption:
nextflow run dhoconno/nvd \
-profile docker \
--tools gottcha \
--samplesheet small_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--experiment_id 12371 \
-process.cpus 4 \
-process.memory 8.GBWhen to use: Shared systems, resource-constrained environments, development work.
Improve performance by using fast storage for temporary files:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12372 \
-work-dir /scratch/nvd_workWhen to use: HPC environments with fast local storage, when your home directory has limited space.
Create detailed analytics for pipeline optimization:
nextflow run dhoconno/nvd \
-profile docker \
--tools all \
--samplesheet my_samples.csv \
--gottcha2_db /data/gottcha2/gottcha_db.species.fna \
--blast_db /data/blast_db \
--blast_db_prefix core_nt \
--stat_index /data/STAT_db/tree_index.20240830.dbs \
--stat_dbss /data/STAT_db/tree_filter.20240830.dbss \
--stat_annotation /data/STAT_db/tree_filter.20240830.dbss.annotation \
--human_virus_taxlist /data/STAT_db/human_viruses_taxlist.txt \
--experiment_id 12373 \
-with-report execution_report.html \
-with-trace execution_trace.txt \
-with-timeline execution_timeline.html \
-with-dag workflow_dag.pngWhen to use: Performance optimization, resource planning, documentation for publications.
- Unique experiment IDs: Always ensure your
experiment_idis unique for each run - Absolute paths: Database paths should be absolute paths to avoid confusion
- Samplesheet format: The CSV format is critical - see
assets/example_samplesheet.csv - LabKey integration: Ensure the
nvd2secret is properly configured before using LabKey features - Resume capability: Use
-resumeto restart failed workflows from the last successful checkpoint - Container considerations: Use
-profile apptainerfor HPC environments without Docker - Disk space: Consider using
--cleanup truewhen processing large datasets - NCBI limits: Respect rate limits with appropriate
max_concurrent_downloadsfor SRA data
For additional help and troubleshooting, see the main README.md file and contributor guide.