A comprehensive Python framework for running, ensembling, and evaluating state-of-the-art medical image segmentation models. This pipeline integrates multiple powerful models and provides a robust toolkit for comparative analysis, making it ideal for research and clinical applications.
This project provides two main pipelines:
- Segmentation Pipeline (
mspipeline.py): An inference pipeline that takes a NIfTI medical scan as input and generates segmentation masks using one or more integrated models. It supports ensembling methods to combine predictions for improved accuracy. - Evaluation Pipeline (
eval.py): A powerful tool for performing quantitative comparison of segmentation results against ground truth masks. It calculates a wide range of metrics, performs statistical analysis, and generates publication-ready visualizations.
- Multi-Model Integration: Seamlessly run and compare three leading segmentation models:
- Swin UNETR: A transformer-based model for accurate organ segmentation.
- TotalSegmentator: A robust model for comprehensive segmentation of over 100 anatomical structures.
- WholeBodyCT: A specialized model for whole-body CT analysis.
- Ensembling Methods: Combine predictions from multiple models using majority voting to generate a more robust final segmentation.
- Comprehensive Evaluation Suite: Go beyond simple metrics. The evaluation pipeline calculates:
- Overlap Metrics: Dice Coefficient, Intersection over Union (IoU).
- Distance Metrics: Hausdorff Distance, Mean Surface Distance.
- Volume Metrics: Volume Similarity, Relative Volume Difference.
- Classification Metrics: Sensitivity, Specificity, Precision, F1-Score.
- Advanced Statistical Analysis: Automatically perform paired t-tests and Wilcoxon signed-rank tests to determine the statistical significance of performance differences between models.
- Automated Visualization: Generates a rich set of plots for intuitive analysis, including:
- Performance heatmaps by organ.
- Box plots comparing metric distributions.
- Bar charts for patient-wise performance.
- Model ranking charts.
- Flexible Configuration: Easily control all aspects of the pipeline—from model selection to preprocessing parameters—through simple configuration files.
- Robust Output Management: Automatically organizes all outputs, including individual organ masks, combined segmentations, evaluation reports (CSV, JSON), and plots.
The project is divided into two logical workflows: Inference and Evaluation.
Input Image (.nii.gz)
│
▼
Preprocessing (Resampling, Normalization, Cropping)
│
▼
┌──────────────────┬──────────────────────┬──────────────────┐
│ Swin UNETR │ TotalSegmentator │ WholeBodyCT │
└──────────────────┴──────────────────────┴──────────────────┘
│
▼
Postprocessing (Resizing to original shape)
│
▼
Ensemble (Majority Vote)
│
▼
Segmentation Maps (Combined & Individual Organs)
Segmentation Maps + Ground Truth Masks
│
▼
Metric Calculation Engine
(Dice, IoU, Hausdorff, etc.)
│
▼
┌────────────────────┬───────────────────┐
│ Statistical │ Visualization │
│ Analysis (t-test) │ Engine (Plots) │
└────────────────────┴───────────────────┘
│
▼
Reports (CSV, JSON, PNG)
-
Clone the repository:
git clone https://github.com/your-username/medical-segmentation-pipeline.git cd medical-segmentation-pipeline -
Create and activate a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate -
Install the required packages: The pipeline relies on several specialized libraries. Install them using pip:
pip install -q "monai-weekly[nibabel, tqdm]" torch totalsegmentator pandas seaborn scipy scikit-learn
The pipeline can be used via its Python API for both segmentation and evaluation.
The EnhancedMedicalSegmentationPipeline class in mspipeline.py is the main entry point for running inference.
Example Python Script:
import mspipeline
# 1. Define the configuration
config = {
'output_base_dir': './output',
'swin_model_path': '/path/to/models/swin_unetr_btcv/models/model.pt',
'wbct_model_path': '/path/to/models/wholebody_ct/models/model.pt',
'use_swin': True,
'use_totalsegmentator': True,
'use_wholebody_ct': True,
'enable_profiling': True
}
# 2. Initialize the pipeline
pipeline = mspipeline.EnhancedMedicalSegmentationPipeline(config)
# 3. Process a single patient image
image_path = '/path/to/data/raw/img0001.nii.gz'
patient_id = 'patient_001'
result = pipeline.process_patient(image_path, patient_id)
# 4. (Optional) Process a batch of images
# image_paths = ['/path/to/img0001.nii.gz', '/path/to/img0002.nii.gz']
# batch_results = pipeline.process_batch(image_paths)
print("Processing complete.")
pipeline.cleanup()The SegmentationEvaluator class in eval.py is used to compare segmentation outputs against a ground truth dataset.
Example Python Script:
import eval
from eval import ModelType
# 1. Define the main configuration
config = {
'evaluation_output_dir': "./evaluation_results",
'ground_truth_dir': "/path/to/data/truth/",
'create_visualizations': True,
'enable_statistical_tests': True,
'enable_parallel_processing': True,
}
# 2. Define the models and their prediction directories
models_config = {
'swin': {
'pred_dir': "./output/swin_comparison",
'gt_dir': "/path/to/data/truth/",
'model_type': ModelType.SWIN_UNETR
},
'ts': {
'pred_dir': "./output/ts_comparison",
'gt_dir': "/path/to/data/truth/",
'model_type': ModelType.TOTALSEGMENTATOR
},
'wbct': {
'pred_dir': "./output/wbct_comparison",
'gt_dir': "/path/to/data/truth/",
'model_type': ModelType.WHOLEBODY_CT
}
}
# 3. Initialize and run the evaluator
evaluator = eval.SegmentationEvaluator(config)
evaluator.run_evaluation(models_config)
print("Evaluation complete. Results are in the 'evaluation_results' directory.")The pipeline generates a structured set of outputs for easy access and analysis.
output/
├── swin_comparison/ # Combined masks for Swin UNETR
│ └── 0001_swin_segmentation.nii.gz
├── ts_comparison/ # Combined masks for TotalSegmentator
│ └── 0001_ts_segmentation.nii.gz
├── wbct_comparison/ # Combined masks for WholeBodyCT (BTCV mapped)
│ └── 0001_wbct_segmentation.nii.gz
└── combined_results/ # Ensembled results
├── patient_001_ensemble_majority.nii.gz
└── patient_001_confidence.nii.gz
evaluation_results/
├── exports/
│ ├── summary_statistics.csv
│ ├── detailed_results.csv
│ ├── model_comparison.csv
│ ├── model_ranking.csv
│ └── comprehensive_report.json
├── visualizations/
│ ├── performance_heatmap.png
│ ├── model_ranking.png
│ ├── patient_performance.png
│ └── statistical_comparison.png
└── evaluation.log
The evaluation pipeline automatically generates several informative plots.
| Performance Heatmap (Dice by Organ) | Model Ranking (Mean Overall Dice) |
|---|---|
| ` | ` |
| Metric Distribution (Box Plots) | Statistical Significance |
|---|---|
| ` | ` |
If you use this pipeline in your research, please cite it as follows:
@misc{Anan2025MedicalSegmentation,
author = {Anan, Ahmed},
title = {Advanced Medical Segmentation & Evaluation Pipeline},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/your-username/medical-segmentation-pipeline}}
}Contributions are welcome! Please feel free to submit a pull request, open an issue, or suggest new features.
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature). - Commit your changes (
git commit -am 'Add some feature'). - Push to the branch (
git push origin feature/your-feature). - Create a new Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- This work heavily utilizes the MONAI framework.
- Integration of the TotalSegmentator model by Wasserthal et al.
- The BTCV dataset used for training Swin UNETR is from the Multi-Atlas Labeling Beyond the Cranial Vault Challenge.