Medical Segmentation & Evaluation Pipeline

A comprehensive Python framework for running, ensembling, and evaluating state-of-the-art medical image segmentation models. This pipeline integrates multiple powerful models and provides a robust toolkit for comparative analysis, making it ideal for research and clinical applications.

Overview

This project provides two main pipelines:

Segmentation Pipeline (mspipeline.py): An inference pipeline that takes a NIfTI medical scan as input and generates segmentation masks using one or more integrated models. It supports ensembling methods to combine predictions for improved accuracy.
Evaluation Pipeline (eval.py): A powerful tool for performing quantitative comparison of segmentation results against ground truth masks. It calculates a wide range of metrics, performs statistical analysis, and generates publication-ready visualizations.

Key Features

Multi-Model Integration: Seamlessly run and compare three leading segmentation models:
- Swin UNETR: A transformer-based model for accurate organ segmentation.
- TotalSegmentator: A robust model for comprehensive segmentation of over 100 anatomical structures.
- WholeBodyCT: A specialized model for whole-body CT analysis.
Ensembling Methods: Combine predictions from multiple models using majority voting to generate a more robust final segmentation.
Comprehensive Evaluation Suite: Go beyond simple metrics. The evaluation pipeline calculates:
- Overlap Metrics: Dice Coefficient, Intersection over Union (IoU).
- Distance Metrics: Hausdorff Distance, Mean Surface Distance.
- Volume Metrics: Volume Similarity, Relative Volume Difference.
- Classification Metrics: Sensitivity, Specificity, Precision, F1-Score.
Advanced Statistical Analysis: Automatically perform paired t-tests and Wilcoxon signed-rank tests to determine the statistical significance of performance differences between models.
Automated Visualization: Generates a rich set of plots for intuitive analysis, including:
- Performance heatmaps by organ.
- Box plots comparing metric distributions.
- Bar charts for patient-wise performance.
- Model ranking charts.
Flexible Configuration: Easily control all aspects of the pipeline—from model selection to preprocessing parameters—through simple configuration files.
Robust Output Management: Automatically organizes all outputs, including individual organ masks, combined segmentations, evaluation reports (CSV, JSON), and plots.

Workflow

The project is divided into two logical workflows: Inference and Evaluation.

Inference Workflow

Input Image (.nii.gz)
       │
       ▼
Preprocessing (Resampling, Normalization, Cropping)
       │
       ▼
┌──────────────────┬──────────────────────┬──────────────────┐
│ Swin UNETR       │   TotalSegmentator   │   WholeBodyCT    │
└──────────────────┴──────────────────────┴──────────────────┘
       │
       ▼
Postprocessing (Resizing to original shape)
       │
       ▼
Ensemble (Majority Vote)
       │
       ▼
Segmentation Maps (Combined & Individual Organs)

Evaluation Workflow

Segmentation Maps + Ground Truth Masks
              │
              ▼
    Metric Calculation Engine
(Dice, IoU, Hausdorff, etc.)
              │
              ▼
┌────────────────────┬───────────────────┐
│ Statistical        │   Visualization   │
│ Analysis (t-test)  │   Engine (Plots)  │
└────────────────────┴───────────────────┘
              │
              ▼
      Reports (CSV, JSON, PNG)

Installation

Clone the repository:

git clone https://github.com/your-username/medical-segmentation-pipeline.git
cd medical-segmentation-pipeline

Create and activate a virtual environment (recommended):
```
python3 -m venv venv
source venv/bin/activate
```
Install the required packages: The pipeline relies on several specialized libraries. Install them using pip:
```
pip install -q "monai-weekly[nibabel, tqdm]" torch totalsegmentator pandas seaborn scipy scikit-learn
```

Usage

The pipeline can be used via its Python API for both segmentation and evaluation.

1. Segmentation Pipeline

The EnhancedMedicalSegmentationPipeline class in mspipeline.py is the main entry point for running inference.

Example Python Script:

import mspipeline

# 1. Define the configuration
config = {
    'output_base_dir': './output',
    'swin_model_path': '/path/to/models/swin_unetr_btcv/models/model.pt',
    'wbct_model_path': '/path/to/models/wholebody_ct/models/model.pt',
    'use_swin': True,
    'use_totalsegmentator': True,
    'use_wholebody_ct': True,
    'enable_profiling': True
}

# 2. Initialize the pipeline
pipeline = mspipeline.EnhancedMedicalSegmentationPipeline(config)

# 3. Process a single patient image
image_path = '/path/to/data/raw/img0001.nii.gz'
patient_id = 'patient_001'
result = pipeline.process_patient(image_path, patient_id)

# 4. (Optional) Process a batch of images
# image_paths = ['/path/to/img0001.nii.gz', '/path/to/img0002.nii.gz']
# batch_results = pipeline.process_batch(image_paths)

print("Processing complete.")
pipeline.cleanup()

2. Evaluation Pipeline

The SegmentationEvaluator class in eval.py is used to compare segmentation outputs against a ground truth dataset.

Example Python Script:

import eval
from eval import ModelType

# 1. Define the main configuration
config = {
    'evaluation_output_dir': "./evaluation_results",
    'ground_truth_dir': "/path/to/data/truth/",
    'create_visualizations': True,
    'enable_statistical_tests': True,
    'enable_parallel_processing': True,
}

# 2. Define the models and their prediction directories
models_config = {
    'swin': {
        'pred_dir': "./output/swin_comparison",
        'gt_dir': "/path/to/data/truth/",
        'model_type': ModelType.SWIN_UNETR
    },
    'ts': {
        'pred_dir': "./output/ts_comparison",
        'gt_dir': "/path/to/data/truth/",
        'model_type': ModelType.TOTALSEGMENTATOR
    },
    'wbct': {
        'pred_dir': "./output/wbct_comparison",
        'gt_dir': "/path/to/data/truth/",
        'model_type': ModelType.WHOLEBODY_CT
    }
}

# 3. Initialize and run the evaluator
evaluator = eval.SegmentationEvaluator(config)
evaluator.run_evaluation(models_config)

print("Evaluation complete. Results are in the 'evaluation_results' directory.")

Output Structure

The pipeline generates a structured set of outputs for easy access and analysis.

Segmentation Outputs (`/output`)

output/
├── swin_comparison/         # Combined masks for Swin UNETR
│   └── 0001_swin_segmentation.nii.gz
├── ts_comparison/           # Combined masks for TotalSegmentator
│   └── 0001_ts_segmentation.nii.gz
├── wbct_comparison/         # Combined masks for WholeBodyCT (BTCV mapped)
│   └── 0001_wbct_segmentation.nii.gz
└── combined_results/        # Ensembled results
    ├── patient_001_ensemble_majority.nii.gz
    └── patient_001_confidence.nii.gz

Evaluation Outputs (`/evaluation_results`)

evaluation_results/
├── exports/
│   ├── summary_statistics.csv
│   ├── detailed_results.csv
│   ├── model_comparison.csv
│   ├── model_ranking.csv
│   └── comprehensive_report.json
├── visualizations/
│   ├── performance_heatmap.png
│   ├── model_ranking.png
│   ├── patient_performance.png
│   └── statistical_comparison.png
└── evaluation.log

Example Visualizations

The evaluation pipeline automatically generates several informative plots.

Performance Heatmap (Dice by Organ)	Model Ranking (Mean Overall Dice)
`	`

Metric Distribution (Box Plots)	Statistical Significance
`	`

Citation

If you use this pipeline in your research, please cite it as follows:

@misc{Anan2025MedicalSegmentation,
  author = {Anan, Ahmed},
  title  = {Advanced Medical Segmentation & Evaluation Pipeline},
  year   = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/your-username/medical-segmentation-pipeline}}
}

Contributing

Contributions are welcome! Please feel free to submit a pull request, open an issue, or suggest new features.

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Commit your changes (git commit -am 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Create a new Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work heavily utilizes the MONAI framework.
Integration of the TotalSegmentator model by Wasserthal et al.
The BTCV dataset used for training Swin UNETR is from the Multi-Atlas Labeling Beyond the Cranial Vault Challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
mis.ipynb		mis.ipynb
mspipeline.py		mspipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Segmentation & Evaluation Pipeline

Overview

Key Features

Workflow

Inference Workflow

Evaluation Workflow

Installation

Usage

1. Segmentation Pipeline

2. Evaluation Pipeline

Output Structure

Segmentation Outputs (`/output`)

Evaluation Outputs (`/evaluation_results`)

Example Visualizations

Citation

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medical Segmentation & Evaluation Pipeline

Overview

Key Features

Workflow

Inference Workflow

Evaluation Workflow

Installation

Usage

1. Segmentation Pipeline

2. Evaluation Pipeline

Output Structure

Segmentation Outputs (/output)

Evaluation Outputs (/evaluation_results)

Example Visualizations

Citation

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Segmentation Outputs (`/output`)

Evaluation Outputs (`/evaluation_results`)

Packages