Reimplementation of the BAP algorithm from Bidirectional Active Processing.md as specified in our Electronics paper available both online and locally. Single-file implementations in Python and Julia with TOML configuration.
- Python 3.11
numpy,pandas,scikit-learntomli(for Python < 3.11)
Use the project venv when present (dependencies are installed there):
source .venv311/bin/activate # macOS / Linux
python bap.py -c examples/configs/config_iris.tomlOr run without activating:
.venv311/bin/python bap.py -c examples/configs/config_iris.tomlTo create or refresh the venv:
python3.11 -m venv .venv311
.venv311/bin/python -m pip install -r requirements.txtConfigs point at CSVs under computing/machine learning datasets (not copies inside other app repos).
- Julia 1.8+
- Packages: CSV, DataFrames, ScikitLearn, TOML
(ScikitLearn requires Python with scikit-learn)
julia -e 'using Pkg; Pkg.activate("./julia"); Pkg.instantiate()'(Run from the repository root.)
Example: Fisher Iris dataset
# Python (after: source .venv311/bin/activate)
python bap.py -c examples/configs/config_iris.toml
# Or with flags
python bap.py --train "../../machine learning datasets/default/fisher_iris.csv" --testing split --split 0.8,0.2 --classifier dt -t 0.95 -n 10 -m 5Example: MNIST
# Python
python bap.py -c examples/configs/config_mnist.toml
# Or with flags
python bap.py --train "../../machine learning datasets/default/mnist_train.csv" --test "../../machine learning datasets/default/mnist_test.csv" --testing fixed --classifier knn --k 5 -n 5 -m 100julia julia/bap.jl -c examples/configs/config_iris.toml
julia julia/bap.jl -c examples/configs/config_mnist.toml
julia julia/bap.jl --train "../../machine learning datasets/default/fisher_iris.csv" -c examples/configs/config_iris.toml # override train pathAll options can be set via TOML config (preferred) or CLI flags. TOML overrides defaults; flags override TOML.
| Parameter | TOML | Definition |
|---|---|---|
train |
train = "path.csv" |
Training CSV (or single dataset to split) |
test |
test = "path.csv" |
Test CSV (for testing.fixed) |
testing |
[testing.fixed] or [testing.split] |
How to obtain test set |
testing.fixed |
[testing.fixed] + test = "..." |
Use separate test file |
testing.split |
[testing.split] + split = [0.8, 0.2] |
Split train ratio : test ratio |
classifier |
classifier = "dt" |
dt, knn, or svm |
parameters |
[parameters] + k = 5 |
Classifier hyperparameters (e.g. k for KNN) |
distance |
distance = "euclidean" |
Distance metric for KNN |
goal.t |
[goal] + t = 0.95 |
Accuracy threshold (0–1) |
direction |
[direction.forward] or [direction.backward] |
Forward (additive) or backward (subtractive) |
splits |
splits = 1 |
Number of train/test splits |
n |
n = 10 |
Iterations per split |
m |
m = 5 |
Cases added/removed per iteration |
sampling |
[sampling.stratified] or [sampling.random] |
Sampling method |
seed |
seed = 42 |
PRNG seed |
output_dir |
output_dir = "results" |
Output directory |
| Flag | Description |
|---|---|
-c, --config |
TOML config file |
--train |
Training CSV |
--test |
Test CSV (for fixed) |
--testing |
fixed | split | cv |
--split |
Train,test ratio, e.g. 0.8,0.2 |
--classifier |
dt | knn | svm |
--k |
K for KNN (default 3) |
--distance |
Metric for KNN |
-t, --threshold |
Accuracy threshold |
--direction |
forward | backward |
--splits |
Number of splits |
-n, --iterations |
Iterations per split |
-m |
Cases per iteration |
--sampling |
random | stratified |
--seed |
Random seed |
-o, --output-dir |
Output directory |
CSVs must have a class column whose header matches class, label, or target (case-insensitive).
All other columns are features (column order may follow your benchmark file, e.g. fisher_iris.csv).
- fisher_iris.csv:
classcolumn - mnist_train.csv, mnist_test.csv:
labelcolumn
| Code | Classifier |
|---|---|
dt |
Decision Tree |
knn |
K-Nearest Neighbors |
svm |
Support Vector Machine (RBF) |
hb_vis |
Hyperblock (VisCanvas-style) |
hb_dv |
Hyperblock (DV-style, interval-based) |
Results are written to {output_dir}/bap_{timestamp}/ (default results/bap_YYYYMMDD_HHMMSS/).
Exported case and hyperblock CSVs use the same tabular shape as the shared datasets (e.g. computing/machine learning datasets/default/fisher_iris.csv):
- One header
class(lowercase), case label for data rows. - One column per attribute (same names as the training CSV). No separate
*_min/*_maxcolumns and no extra ID column. split_N/converged_exp_{id}_seed{seed}.csv– converged training cases:classholds the dataset label (e.g.Setosa).split_N/converged_exp_{id}_seed{seed}_hyperblocks.csv– when usinghb_visorhb_dv: two rows per hyperblock. Each row has the same attribute columns; values are the box minimum (…__bottom) and maximum (…__top) corners. Theclasscell encodes label, HB id, and edge, e.g.Setosa__HB0__bottom/Setosa__HB0__top. Any__in the dataset label is replaced by_so the suffix pattern stays parseable.
Other output:
config.txt– Settings usedstatistics.csv– Aggregate statistics (mean/min/max cases, convergence rate, etc.)
Every converged result CSV has a matching _hyperblocks.csv in the same directory when using hyperblock classifiers.
- Set PRNG seed
- For each split: load data (fixed test or split)
- For each iteration: start with empty set (forward) or full set (backward)
- While accuracy < threshold and cases remain: add/remove
mcases via sampling - Record converged subsets and compute statistics
This repository originally exposed BAP behavior through rebuild.py; that interface has now been folded into the TOML/CLI model used by bap.py and julia/bap.jl.
- Legacy
--datamaps totrain - Legacy
--test-datamaps totestwithtesting.fixed - Legacy train/test split flags map to
testing.splitwithsplit = [train_ratio, test_ratio] - Legacy
--action {additive,subtractive}maps todirection {forward,backward} - Legacy
--iterationsmaps ton - Legacy
--thresholdmaps togoal.t
Core BAP input fields are:
train,test/testing,classifier,parameters,distancegoal.t,direction,splits,n,msampling,seed
- Run configuration export (
config.txt) - Converged case-set CSV artifacts per successful iteration
- Aggregate statistics including convergence rate and sureness-related measures
- Initialize PRNG with
seed. - Repeat for each split:
- Build train/test partitions (
testing.fixedortesting.split).
- Build train/test partitions (
- Repeat
ntimes:- Start with empty set (
direction.forward) or full train set (direction.backward). - Train/test until threshold
goal.tis met, adding/removingmcases each step. - Mark iteration as failed if no cases remain before meeting threshold.
- Persist converged set and increment seed on success.
- Start with empty set (