Skip to content

artefactory/woodtapper

Repository files navigation

User-friendly Python toolbox for interpreting and manipulating decision tree ensembles from scikit-learn

CI Status Linting , formatting, imports sorting: ruff Pre-commit Docs

License Python Versions PyPI Version status

🪵 Key Features

WoodTapper is a Python toolbox that provides:

  • Rule extraction from tree-based ensembles: Generates a final estimator composed of a sequence of simple rule-based on features and thresholds.

  • Example-based explanations: Connects predictions to a small set of representative samples, returning the most similar examples along with their target values.

Detailed information about the modules can be found here.

WoodTapper is supported by a peer reviewed publication:

    WoodTapper: a Python package for explaining decision tree ensembles, Sakho et al. (2026) 📄

🛠 Installation

From PyPi:

pip install woodtapper

Warning (scikit-learn already installed): If you install woodtapper in an environment where scikit-learn is already present, the prebuilt PyPI wheel may not be compatible with your existing scikit-learn binary. In that case, reinstall woodtapper from source so it is compiled against the scikit-learn version in your environment:

pip uninstall -y woodtapper
pip install -U pip setuptools wheel
pip install -U Cython pybind11
pip install --no-binary=woodtapper --no-build-isolation woodtapper

From source:

git clone https://github.com/artefactory/woodtapper.git
cd woodtapper
pip install -e .[dev,docs]

Warning: If you are a Windows user, you need to have a C/C++ compiler before installing woodtapper.

🌿 WoodTapper RulesExtraction module

from woodtapper.extract_rules import SirusClassifier
from woodtapper.extract_rules.visualization import show_rules

sirus = SirusClassifier(n_estimators=1000, max_depth=2,
                        quantile=10, p0=0.01, random_state=0)
sirus.fit(X_train, y_train)
y_pred_sirus = sirus.predict(X_test)
show_rules(sirus, max_rules=10)

🌱 WoodTapper ExampleExplanation module

from woodtapper.example_sampling import RandomForestClassifierExplained

rf_explained = RandomForestClassifierExplained(n_estimators=100)
rf_explained.fit(X_train, y_train)

# Get the 5 most similar samples (and target) for each test sample
Xy_explain = rf_explained.explanation(X_test)

🙏 Acknowledgements

This work was done through a partnership between the Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.

   

📜 Citation

If you find the code useful, please consider citing us:

@article{Sakho2026,
doi = {10.21105/joss.10112},
url = {https://doi.org/10.21105/joss.10112},
year = {2026}, publisher = {The Open Journal},
volume = {11},
number = {121},
pages = {10112},
author = {Sakho, Abdoulaye and Aouad, Jad and Gauthier, Carl-Erik and Malherbe, Emmanuel and Scornet, Erwan},
title = {WoodTapper: a Python package for explaining decision tree ensembles},
journal = {Journal of Open Source Software} }

For SIRUS methodology, consider citing:

@article{benard2021sirus,
  title={Sirus: Stable and interpretable rule set for classification},
  author={Benard, Clement and Biau, Gerard and Da Veiga, Sebastien and Scornet, Erwan},
  year={2021}
}