Skip to content

agarba360-beep/pcv2-epitope-platform

Repository files navigation

🧬 PCV2 Epitope Mapping & Prediction Platform

Python ML Bioinformatics Status

A structure-aware, machine learning–driven bioinformatics pipeline for predicting antibody-accessible epitopes on the Porcine Circovirus Type 2 (PCV2) capsid protein (ORF2).


🌐 Live Application

πŸ”— https://pcv2.epitope.aiconceptlimited.com.ng/


🎯 Overview

This project implements a research-grade computational framework integrating:

  • Evolutionary sequence analysis
  • Structural biology (PDB-based features)
  • Physicochemical characterization
  • Machine learning (XGBoost)

to identify potential B-cell epitopes on the PCV2 capsid protein.

πŸ’‘ Applications

  • Epitope discovery
  • Vaccine target identification
  • Viral antigen characterization
  • Immunoinformatics research

🧠 Core Concept

Epitope prediction is treated as a multi-modal biological inference problem:

Sequence β†’ Evolution β†’ Structure β†’ Features β†’ ML β†’ Prediction β†’ Validation

πŸ”¬ Data Sources

Component Source
Protein sequences NCBI (Entrez API)
Reference sequence UniProt
Protein structures PDB (3R0R, 6EZG)
Epitope validation IEDB

βš™οΈ Pipeline Architecture

NCBI Retrieval
     ↓
Sequence Cleaning (capsid-only filtering)
     ↓
Multiple Sequence Alignment (MAFFT)
     ↓
Feature Engineering
  - Conservation
  - Entropy
  - SASA
  - Residue Depth
  - Electrostatics
     ↓
Feature Matrix Construction
     ↓
Epitope Labeling (IEDB)
     ↓
Machine Learning (XGBoost)
     ↓
Prediction
     ↓
3D + Sequence Visualization (Streamlit)

🧬 Feature Engineering

🧬 Evolutionary Features

  • Conservation score (frequency-based)
  • Shannon entropy (sequence variability)

🧱 Structural Features

  • Solvent Accessible Surface Area (SASA)
  • Residue depth
  • Secondary structure (loop/helix/sheet)
  • Electrostatics

βš—οΈ Physicochemical Features

  • Hydrophobicity
  • Charge distribution

πŸ”„ Contextual Features

  • Sliding window (Β±2 residues)
  • Spatial neighborhood aggregation

πŸ€– Machine Learning

  • Model: XGBoost Classifier
  • Input: Residue-level feature matrix
  • Output: Probability of epitope per residue

Training Strategy

  • Imbalanced dataset handling
  • Threshold tuning (default: 0.25)
  • Feature importance extraction

πŸ“Š Results

Metric Value
Total residues ~162–245
Predicted epitopes ~24
Validated (IEDB overlap) ~4
ROC-AUC ~0.70–0.75

πŸ§ͺ Validation Strategy

  • Predictions compared with IEDB experimental epitopes
  • Overlap analysis performed at residue level

Interpretation

  • βœ… Overlapping residues β†’ validated epitopes
  • πŸ”¬ Non-overlapping β†’ novel candidate epitopes

🧬 Biological Insights

Predicted epitopes are enriched in:

  • Surface-exposed regions (high SASA)
  • Loop/coil structures
  • High-entropy (variable) regions

πŸ‘‰ This aligns with known principles of antibody binding.


πŸ“ Project Structure

pcv2_epitope_project/
β”‚
β”œβ”€β”€ data/                  # Metadata, mappings, IEDB data
β”œβ”€β”€ sequences/             # FASTA + alignments
β”œβ”€β”€ structures/            # PDB files (3R0R, 6EZG)
β”œβ”€β”€ features/              # Engineered features
β”œβ”€β”€ results/               # Predictions + evaluation
β”œβ”€β”€ models/                # Trained ML model
β”œβ”€β”€ scripts/               # Feature + analysis scripts
β”œβ”€β”€ pipeline/              # Automation scripts
β”‚
β”œβ”€β”€ dashboard.py           # Streamlit interface
└── run_smart_pipeline.py  # Full pipeline runner

βš™οΈ Installation

git clone https://github.com/YOUR_USERNAME/pcv2-epitope-platform.git
cd pcv2-epitope-platform

python -m venv pcv2_env
source pcv2_env/bin/activate

pip install -r requirements.txt

▢️ Usage

Run Full Pipeline

python run_smart_pipeline.py

Launch Dashboard

streamlit run dashboard.py

πŸ“Š Dashboard Features

  • πŸ“ˆ Epitope probability plots
  • 🧬 Sequence visualization (UniProt-aligned)
  • 🧊 3D structure mapping (Py3Dmol)
  • πŸ§ͺ IEDB validation overlay
  • πŸ“¦ Epitope clustering

⚠️ Limitations

  • Limited experimentally validated epitopes (class imbalance)
  • Predictions are computational (require lab validation)
  • Sequence–structure mapping introduces approximation

πŸš€ Future Work

  • Graph Neural Networks (GNN)
  • Transformer-based protein models
  • Improved structural alignment
  • REST API deployment
  • Continuous data updates (automated pipeline)

🀝 Collaboration

Open to collaborations in:

  • Bioinformatics
  • Immunoinformatics
  • Vaccine design
  • Structural biology

πŸ“œ Disclaimer

This system provides computational predictions and should not replace experimental validation.


πŸ‘€ Author

Abubakar Bioinformatics & Computational Biology


⭐ Acknowledgements

  • NCBI (sequence data)
  • RCSB PDB (structural data)
  • IEDB (epitope data)
  • Biopython, XGBoost, Streamlit communities

About

Structure-aware machine learning platform for PCV2 capsid epitope prediction using sequence, structural, and evolutionary features

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors