Skip to content

sohaibzafar90/HSV1-variant-analysis-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HSV1 Reference vs Mutant Analysis

This pipeline analyzes the differences between an HSV1 reference strain (Strain 17) and a mutant strain (McKrae) using the nf-core/viralrecon pipeline. This documentation covers the setup and execution of the analysis pipeline

.
├── analysis.sh        # Main execution script
├── custom.config      # Resource configuration (Memory fixes)
├── samplesheet.csv    # Input file manifest
├── refs/              # Reference genome (HSV-1 Strain 17, Accession: NC_001806.2)
└── data/              # Input FASTQ files (McKrae Strain from Renner et al. 2021)

Setup & Data Preparation

1. Reference Genome

Used HSV-1 Strain 17 (Accession: NC_001806.2).

mkdir -p refs
cd refs
wget -O wildtype.fasta "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_001806.2&rettype=fasta&retmode=text"
cd ..

2. Test Data (McKrae Strain)

Subsampled dataset was used (~100k reads) derived from the full shotgun sequencing run (SRR13801763). This ensures rapid testing while validating the variant calling logic.

Commands to generate the mini-dataset:

mkdir -p data
cd data

# Stream and subsample the top 400,000 lines (100k reads)
curl -L "https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR138/063/SRR13801763/SRR13801763_1.fastq.gz" | zcat | head -n 400000 | gzip > McKrae_Sub_R1.fastq.gz

curl -L "https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR138/063/SRR13801763/SRR13801763_2.fastq.gz" | zcat | head -n 400000 | gzip > McKrae_Sub_R2.fastq.gz
cd ..

Running the Pipeline

Prerequisites

  • Nextflow
  • Installation Guide
  • Singularity singularity was installed using ubuntu official universe repository
sudo apt update
sudo apt install singularity-container

Execution

The analysis is controlled by analysis.sh. It uses a custom.config file to optimize memory usage for local execution (preventing Bowtie2 memory crashes).

# 1. Make executable
chmod +x analysis.sh

# 2. Run
./analysis.sh

Analysis Command Details

The analysis.sh script executes the following Nextflow command. Several packages (Assembly, Kraken2, etc.) were skipped for a faster test run as this was a test.

nextflow run nf-core/viralrecon \
    -profile singularity \
    -c custom.config \
    --input samplesheet.csv \
    --outdir ./results_test_run \
    --platform illumina \
    --protocol metagenomic \
    --genome 'refs/wildtype.fasta' \
    --skip_assembly \
    --skip_kraken2 \
    --skip_pangolin \
    --skip_nextclade \
    --skip_freyja \
    --variant_caller bcftools \
    

Results

Outputs are stored in the results_test_run/ directory.

  • Quality Control: results_test_run/multiqc/multiqc_report.html
  • Variants (SNPs): results_test_run/variants/bcftools/McKrae_Sub.vcf.gz
    • Contains the list of mutations distinguishing McKrae from Strain 17.
  • Mapping: results_test_run/mapping/bowtie2/McKrae_Sub.bam

Troubleshooting

Memory Error: If the pipeline fails at BOWTIE2_BUILD with "Process requirement exceeds available memory", ensure custom.config is present and correctly referenced in the run command.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages