A comprehensive, modular, and configurable framework for evaluating Machine Learning-based Intrusion Detection Systems (IDS).
- IDS Evaluation Framework
- Modular Plugin Architecture: Easily extend the framework with custom IDS models, metrics, and adversarial attacks
- Flexible Data Pipeline: Load, preprocess, and split datasets with configurable preprocessing steps and feature selection
- Multiple Evaluation Modes: Support for intra-dataset, cross-dataset, and k-fold cross-validation evaluation
- Comprehensive Metrics: Built-in static metrics (accuracy, F1, precision, recall, ROC-AUC, etc.) and runtime metrics (CPU, RAM, training time)
- Adversarial Robustness Testing: Evaluate model robustness against adversarial attacks (FGSM, noise perturbation, junk data injection)
- Reproducible Results: Hash-based output organization ensures consistent experiment tracking
- Flexible Deployment: Run natively with Python or via Docker
- Python 3.13+
- uv (recommended) or pip
# Install dependencies (uv should be in your $PATH)
uv sync
# Verify installation
uv run ids-eval version- Official Docker Images (stable releases): https://hub.docker.com/r/niklassandhu/ids-eval-framework
- Currently supported architectures for docker images are: arm64 (Raspberry Pi, Apple Silicon, ...), amd64 (AMD, Intel)
# Configure environment variables
cp .env.example .env
# Edit .env to set your data paths
# Run via Docker Compose
docker compose run --rm ids-eval versionA pre-built Docker image is available on Docker Hub: niklassandhu/ids-eval-framework:latest
Copy the example configuration and adjust it to your needs:
cp run_config/example.config.yml run_config/my_config.ymlRun the data preparation pipeline:
uv run ids-eval dataset <run_config>Execute the evaluation pipeline:
uv run ids-eval evaluate <run_config>The framework provides two main commands:
| Command | Description |
|---|---|
ids-eval dataset <config.yml> |
Run dataset pipeline |
ids-eval evaluate <config.yml> |
Run evaluation pipeline |
| Flag | Description |
|---|---|
--train-only |
Only train models, skip testing phase |
--force-train |
Force retraining, ignore saved models |
--force-model |
Load saved models without config hash validation |
--clear-checkpoints |
Clear evaluation checkpoints before running |
make dataset CONFIG=<config.yml> # Run dataset pipeline
make evaluate CONFIG=<config.yml> # Run evaluation pipeline
make docker-dataset CONFIG=<config.yml> # Run dataset pipeline via Docker
make docker-evaluate CONFIG=<config.yml> # Run evaluation via Docker
make help # Show all available targetsThe framework uses YAML configuration files. See run_config/example.config.yml for a fully documented example.
- general: Run name, paths, random seed
- data_manager: Dataset loading, preprocessing, feature selection, train/test split
- evaluation: IDS models, metrics, adversarial attacks
All outputs are organized in hash-based directories for reproducibility:
out/
├── processed_datasets/<hash>/ # Preprocessed datasets
├── saved_models/<hash>/ # Trained models
└── reports/<hash>/ # Evaluation reports
├── config.yaml # Configuration used
├── dataset_report.yaml # Dataset statistics
├── ids_report.yaml # Detailed evaluation results
└── evaluation_summary.yaml # Aggregated summary
The configuration hash is displayed at startup:
Your config hash is: a1b2c3d4
The framework supports four types of plugins:
| Plugin Type | Directory | Base Class |
|---|---|---|
| IDS Models | plugin_ids/ |
AbstractIDSConnector |
| Static Metrics | plugin_static_metric/ |
AbstractStaticMetric |
| Runtime Metrics | plugin_runtime_metric/ |
AbstractRuntimeMetric |
| Adversarial Attacks | plugin_adversarial/ |
AbstractAdversarialAttack |
See the existing plugins in each directory for implementation examples.
make setup # Install dependencies
make test # Run tests
make lint # Check code style
make format # Format codeSee LICENSE for details.
