Skip to content

AvaAvarai/IterativeSurenessTester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bidirectional Active Processing (BAP) Implementation

Reimplementation of the BAP algorithm from Bidirectional Active Processing.md as specified in our Electronics paper available both online and locally. Single-file implementations in Python and Julia with TOML configuration.

Requirements

Python

  • Python 3.11
  • numpy, pandas, scikit-learn
  • tomli (for Python < 3.11)

Use the project venv when present (dependencies are installed there):

source .venv311/bin/activate   # macOS / Linux
python bap.py -c examples/configs/config_iris.toml

Or run without activating:

.venv311/bin/python bap.py -c examples/configs/config_iris.toml

To create or refresh the venv:

python3.11 -m venv .venv311
.venv311/bin/python -m pip install -r requirements.txt

Configs point at CSVs under computing/machine learning datasets (not copies inside other app repos).

Julia

  • Julia 1.8+
  • Packages: CSV, DataFrames, ScikitLearn, TOML
    (ScikitLearn requires Python with scikit-learn)
julia -e 'using Pkg; Pkg.activate("./julia"); Pkg.instantiate()'

(Run from the repository root.)

Quick Start

Single CSV (train/test split)

Example: Fisher Iris dataset

# Python (after: source .venv311/bin/activate)
python bap.py -c examples/configs/config_iris.toml

# Or with flags
python bap.py --train "../../machine learning datasets/default/fisher_iris.csv" --testing split --split 0.8,0.2 --classifier dt -t 0.95 -n 10 -m 5

Separate train and test CSVs

Example: MNIST

# Python
python bap.py -c examples/configs/config_mnist.toml

# Or with flags
python bap.py --train "../../machine learning datasets/default/mnist_train.csv" --test "../../machine learning datasets/default/mnist_test.csv" --testing fixed --classifier knn --k 5 -n 5 -m 100

Julia

julia julia/bap.jl -c examples/configs/config_iris.toml
julia julia/bap.jl -c examples/configs/config_mnist.toml
julia julia/bap.jl --train "../../machine learning datasets/default/fisher_iris.csv" -c examples/configs/config_iris.toml  # override train path

Configuration

All options can be set via TOML config (preferred) or CLI flags. TOML overrides defaults; flags override TOML.

TOML structure

Parameter TOML Definition
train train = "path.csv" Training CSV (or single dataset to split)
test test = "path.csv" Test CSV (for testing.fixed)
testing [testing.fixed] or [testing.split] How to obtain test set
testing.fixed [testing.fixed] + test = "..." Use separate test file
testing.split [testing.split] + split = [0.8, 0.2] Split train ratio : test ratio
classifier classifier = "dt" dt, knn, or svm
parameters [parameters] + k = 5 Classifier hyperparameters (e.g. k for KNN)
distance distance = "euclidean" Distance metric for KNN
goal.t [goal] + t = 0.95 Accuracy threshold (0–1)
direction [direction.forward] or [direction.backward] Forward (additive) or backward (subtractive)
splits splits = 1 Number of train/test splits
n n = 10 Iterations per split
m m = 5 Cases added/removed per iteration
sampling [sampling.stratified] or [sampling.random] Sampling method
seed seed = 42 PRNG seed
output_dir output_dir = "results" Output directory

CLI flags (Python)

Flag Description
-c, --config TOML config file
--train Training CSV
--test Test CSV (for fixed)
--testing fixed | split | cv
--split Train,test ratio, e.g. 0.8,0.2
--classifier dt | knn | svm
--k K for KNN (default 3)
--distance Metric for KNN
-t, --threshold Accuracy threshold
--direction forward | backward
--splits Number of splits
-n, --iterations Iterations per split
-m Cases per iteration
--sampling random | stratified
--seed Random seed
-o, --output-dir Output directory

Data format

CSVs must have a class column whose header matches class, label, or target (case-insensitive).
All other columns are features (column order may follow your benchmark file, e.g. fisher_iris.csv).

  • fisher_iris.csv: class column
  • mnist_train.csv, mnist_test.csv: label column

Classifiers

Code Classifier
dt Decision Tree
knn K-Nearest Neighbors
svm Support Vector Machine (RBF)
hb_vis Hyperblock (VisCanvas-style)
hb_dv Hyperblock (DV-style, interval-based)

Output

Results are written to {output_dir}/bap_{timestamp}/ (default results/bap_YYYYMMDD_HHMMSS/).

Exported case and hyperblock CSVs use the same tabular shape as the shared datasets (e.g. computing/machine learning datasets/default/fisher_iris.csv):

  • One header class (lowercase), case label for data rows.
  • One column per attribute (same names as the training CSV). No separate *_min / *_max columns and no extra ID column.
  • split_N/converged_exp_{id}_seed{seed}.csv – converged training cases: class holds the dataset label (e.g. Setosa).
  • split_N/converged_exp_{id}_seed{seed}_hyperblocks.csv – when using hb_vis or hb_dv: two rows per hyperblock. Each row has the same attribute columns; values are the box minimum (…__bottom) and maximum (…__top) corners. The class cell encodes label, HB id, and edge, e.g. Setosa__HB0__bottom / Setosa__HB0__top. Any __ in the dataset label is replaced by _ so the suffix pattern stays parseable.

Other output:

  • config.txt – Settings used
  • statistics.csv – Aggregate statistics (mean/min/max cases, convergence rate, etc.)

Every converged result CSV has a matching _hyperblocks.csv in the same directory when using hyperblock classifiers.

Algorithm (summary)

  1. Set PRNG seed
  2. For each split: load data (fixed test or split)
  3. For each iteration: start with empty set (forward) or full set (backward)
  4. While accuracy < threshold and cases remain: add/remove m cases via sampling
  5. Record converged subsets and compute statistics

Bidirectional Processing Definition Notes

This repository originally exposed BAP behavior through rebuild.py; that interface has now been folded into the TOML/CLI model used by bap.py and julia/bap.jl.

Parameter mapping

  • Legacy --data maps to train
  • Legacy --test-data maps to test with testing.fixed
  • Legacy train/test split flags map to testing.split with split = [train_ratio, test_ratio]
  • Legacy --action {additive,subtractive} maps to direction {forward,backward}
  • Legacy --iterations maps to n
  • Legacy --threshold maps to goal.t

Formal algorithm input

Core BAP input fields are:

  • train, test/testing, classifier, parameters, distance
  • goal.t, direction, splits, n, m
  • sampling, seed

Formal algorithm output

  • Run configuration export (config.txt)
  • Converged case-set CSV artifacts per successful iteration
  • Aggregate statistics including convergence rate and sureness-related measures

Procedure (expanded)

  1. Initialize PRNG with seed.
  2. Repeat for each split:
    • Build train/test partitions (testing.fixed or testing.split).
  3. Repeat n times:
    • Start with empty set (direction.forward) or full train set (direction.backward).
    • Train/test until threshold goal.t is met, adding/removing m cases each step.
    • Mark iteration as failed if no cases remain before meeting threshold.
    • Persist converged set and increment seed on success.

About

First sureness measure test tool with Supervised Iterative Learning alternative to Active Learning to find ML model stability point in supervised classifiers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors