Skip to content

Add learn-weights CLI step with PU learning and Optuna tuning#55

Merged
iankchristie merged 2 commits into
mainfrom
ichristi/learn_weights
May 21, 2026
Merged

Add learn-weights CLI step with PU learning and Optuna tuning#55
iankchristie merged 2 commits into
mainfrom
ichristi/learn_weights

Conversation

@iankchristie

@iankchristie iankchristie commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Introduces a new reVeal learn-weights pipeline step that trains a PUExtraTrees model on a normalized grid using point labels as positive samples. Derives feature importance weights for use with score-weighted.

  • Add reVeal/pu/ — PUExtraTree/PUExtraTrees adapted from https://github.com/jonathanwilton/PUExtraTrees, with modifications from /projects/largeload/.../models/PUExtraTrees/ (deterministic seeding, joblib parallelism)
  • Add reVeal/learn_weights.py — data preparation logic adapted from /projects/largeload/.../models/prepare_data.py (DataHandler), training and metrics from /projects/largeload/.../models/models.py (ModelTrainer)
  • Add Optuna-based hyperparameter tuning for class_prior — adapted from ModelTrainer.tune_hyperparameters() and ModelTrainer.objective() in /projects/largeload/.../models/models.py
  • Add CLI command (reVeal/cli/learn_weights.py), config (reVeal/config/learn_weights.py), and tests — new code following existing reVeal CLI patterns (score_weighted, normalize)
  • Add joblib, scipy, optuna to pyproject.toml and environment.yml

Tested by running the new CLI tool like so

python -m reVeal.cli.cli learn-weights -c /projects/largeload/geospatial/runs/test_scenarios/learned_weights_2026-05-19_agg64/config_learn_weights.json

You can use the learned_weights_2026-05-19_agg64/ directory to test this code.

Next Steps:
This works well to generate the score weights. However, in practice there many of these features are highly correlated or we want to manually exclude for other purposes. We want a few feature engineering tools that will allow users to iterate on the score outputs. This will include:

  1. An exclude list
  2. Visualization features (dendrogram and correlation matrix) to help users feature engineer before it goes into the score_weighted step.

@iankchristie iankchristie requested a review from ppinchuk as a code owner May 20, 2026 19:49
Copilot AI review requested due to automatic review settings May 20, 2026 19:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new reVeal learn-weights pipeline step that uses PU (positive–unlabeled) ExtraTrees training to derive feature-importance-based weights and emit a score-weighted-compatible configuration.

Changes:

  • Introduces a PU ExtraTrees/ExtraTree implementation (reVeal/pu/*) with optional joblib parallelism and deterministic seeding.
  • Adds core learning + Optuna tuning workflow (reVeal/learn_weights.py) and exposes it via a new CLI command/config (reVeal/cli/learn_weights.py, reVeal/config/learn_weights.py).
  • Adds tests for data prep, training, tuning, and config generation (tests/test_learn_weights.py) and updates dependencies (pyproject.toml, environment.yml).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
tests/test_learn_weights.py New test coverage for the learn-weights pipeline helpers and Optuna tuning.
reVeal/pu/trees.py Adds PUExtraTrees forest implementation plus grid/cutpoint utilities.
reVeal/pu/tree.py Adds PUExtraTree decision tree implementation used by the forest.
reVeal/pu/init.py Exposes PUExtraTrees from the new reVeal.pu package.
reVeal/learn_weights.py Implements PU data preparation, training, Optuna tuning, and weight/config generation.
reVeal/config/learn_weights.py Adds pydantic config model for learn-weights inputs.
reVeal/cli/learn_weights.py New CLI command wiring to run learn-weights and write outputs.
reVeal/cli/cli.py Registers the new learn-weights command in the main CLI.
pyproject.toml Adds new runtime dependencies needed by learn-weights/PU trees (but missing scikit-learn).
environment.yml Adds conda dependencies for the new functionality (but missing scikit-learn).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread reVeal/learn_weights.py Outdated
Comment thread reVeal/learn_weights.py
Comment thread reVeal/learn_weights.py
Comment thread reVeal/learn_weights.py Outdated
Comment thread reVeal/learn_weights.py Outdated
Comment thread environment.yml
Comment thread reVeal/pu/trees.py
Comment thread reVeal/pu/tree.py
Comment thread reVeal/pu/trees.py
Comment thread reVeal/pu/tree.py
Introduces a new `reVeal learn-weights` pipeline step that trains a
PUExtraTrees model on a normalized grid using point labels as positive
samples. Derives feature importance weights for use with `score-weighted`.

- Add reVeal/pu/ — PUExtraTree/PUExtraTrees adapted from
  https://github.com/jonathanwilton/PUExtraTrees, with modifications
  from /projects/largeload/.../models/PUExtraTrees/ (deterministic
  seeding, joblib parallelism)
- Add reVeal/learn_weights.py — data preparation logic adapted from
  /projects/largeload/.../models/prepare_data.py (DataHandler), training
  and metrics from /projects/largeload/.../models/models.py (ModelTrainer)
- Add Optuna-based hyperparameter tuning for class_prior — adapted from
  ModelTrainer.tune_hyperparameters() and ModelTrainer.objective() in
  /projects/largeload/.../models/models.py
- Add CLI command (reVeal/cli/learn_weights.py), config
  (reVeal/config/learn_weights.py), and tests — new code following
  existing reVeal CLI patterns (score_weighted, normalize)
- Add joblib, scipy, optuna to pyproject.toml and environment.yml
@iankchristie iankchristie force-pushed the ichristi/learn_weights branch from 66ba519 to 687f961 Compare May 20, 2026 20:04
@iankchristie iankchristie changed the title Add learn-weights CLI step with PU learning and Optuna tuning DNR: Add learn-weights CLI step with PU learning and Optuna tuning May 20, 2026
@iankchristie iankchristie force-pushed the ichristi/learn_weights branch 2 times, most recently from 6d2e29f to a602ddc Compare May 20, 2026 22:08
@iankchristie iankchristie force-pushed the ichristi/learn_weights branch from a602ddc to 2374e7b Compare May 20, 2026 22:11
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 64.18605% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.62%. Comparing base (dc210e6) to head (2374e7b).

Files with missing lines Patch % Lines
reVeal/learn_weights.py 72.72% 35 Missing and 4 partials ⚠️
reVeal/cli/learn_weights.py 26.92% 38 Missing ⚠️

❌ Your patch status has failed because the patch coverage (64.18%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
- Coverage   84.50%   81.62%   -2.89%     
==========================================
  Files          17       20       +3     
  Lines        1304     1518     +214     
  Branches      180      200      +20     
==========================================
+ Hits         1102     1239     +137     
- Misses        163      236      +73     
- Partials       39       43       +4     
Flag Coverage Δ
unittests 81.62% <64.18%> (-2.89%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@iankchristie iankchristie changed the title DNR: Add learn-weights CLI step with PU learning and Optuna tuning Add learn-weights CLI step with PU learning and Optuna tuning May 20, 2026

@ppinchuk ppinchuk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests and keeping the environment lockfile updated!

@iankchristie iankchristie merged commit 8cd3c1e into main May 21, 2026
10 checks passed
@iankchristie iankchristie deleted the ichristi/learn_weights branch May 21, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants