This pipeline analyzes the differences between an HSV1 reference strain (Strain 17) and a mutant strain (McKrae) using the nf-core/viralrecon pipeline. This documentation covers the setup and execution of the analysis pipeline
.
├── analysis.sh # Main execution script
├── custom.config # Resource configuration (Memory fixes)
├── samplesheet.csv # Input file manifest
├── refs/ # Reference genome (HSV-1 Strain 17, Accession: NC_001806.2)
└── data/ # Input FASTQ files (McKrae Strain from Renner et al. 2021)
Used HSV-1 Strain 17 (Accession: NC_001806.2).
mkdir -p refs
cd refs
wget -O wildtype.fasta "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_001806.2&rettype=fasta&retmode=text"
cd ..Subsampled dataset was used (~100k reads) derived from the full shotgun sequencing run (SRR13801763). This ensures rapid testing while validating the variant calling logic.
Commands to generate the mini-dataset:
mkdir -p data
cd data
# Stream and subsample the top 400,000 lines (100k reads)
curl -L "https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR138/063/SRR13801763/SRR13801763_1.fastq.gz" | zcat | head -n 400000 | gzip > McKrae_Sub_R1.fastq.gz
curl -L "https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR138/063/SRR13801763/SRR13801763_2.fastq.gz" | zcat | head -n 400000 | gzip > McKrae_Sub_R2.fastq.gz
cd ..- Nextflow
- Installation Guide
- Singularity singularity was installed using ubuntu official universe repository
sudo apt update
sudo apt install singularity-containerThe analysis is controlled by analysis.sh. It uses a custom.config file to optimize memory usage for local execution (preventing Bowtie2 memory crashes).
# 1. Make executable
chmod +x analysis.sh
# 2. Run
./analysis.shThe analysis.sh script executes the following Nextflow command. Several packages (Assembly, Kraken2, etc.) were skipped for a faster test run as this was a test.
nextflow run nf-core/viralrecon \
-profile singularity \
-c custom.config \
--input samplesheet.csv \
--outdir ./results_test_run \
--platform illumina \
--protocol metagenomic \
--genome 'refs/wildtype.fasta' \
--skip_assembly \
--skip_kraken2 \
--skip_pangolin \
--skip_nextclade \
--skip_freyja \
--variant_caller bcftools \
Outputs are stored in the results_test_run/ directory.
- Quality Control:
results_test_run/multiqc/multiqc_report.html - Variants (SNPs):
results_test_run/variants/bcftools/McKrae_Sub.vcf.gz- Contains the list of mutations distinguishing McKrae from Strain 17.
- Mapping:
results_test_run/mapping/bowtie2/McKrae_Sub.bam
Memory Error: If the pipeline fails at BOWTIE2_BUILD with "Process requirement exceeds available memory", ensure custom.config is present and correctly referenced in the run command.