Add learn-weights CLI step with PU learning and Optuna tuning#55
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new reVeal learn-weights pipeline step that uses PU (positive–unlabeled) ExtraTrees training to derive feature-importance-based weights and emit a score-weighted-compatible configuration.
Changes:
- Introduces a PU ExtraTrees/ExtraTree implementation (
reVeal/pu/*) with optional joblib parallelism and deterministic seeding. - Adds core learning + Optuna tuning workflow (
reVeal/learn_weights.py) and exposes it via a new CLI command/config (reVeal/cli/learn_weights.py,reVeal/config/learn_weights.py). - Adds tests for data prep, training, tuning, and config generation (
tests/test_learn_weights.py) and updates dependencies (pyproject.toml,environment.yml).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_learn_weights.py | New test coverage for the learn-weights pipeline helpers and Optuna tuning. |
| reVeal/pu/trees.py | Adds PUExtraTrees forest implementation plus grid/cutpoint utilities. |
| reVeal/pu/tree.py | Adds PUExtraTree decision tree implementation used by the forest. |
| reVeal/pu/init.py | Exposes PUExtraTrees from the new reVeal.pu package. |
| reVeal/learn_weights.py | Implements PU data preparation, training, Optuna tuning, and weight/config generation. |
| reVeal/config/learn_weights.py | Adds pydantic config model for learn-weights inputs. |
| reVeal/cli/learn_weights.py | New CLI command wiring to run learn-weights and write outputs. |
| reVeal/cli/cli.py | Registers the new learn-weights command in the main CLI. |
| pyproject.toml | Adds new runtime dependencies needed by learn-weights/PU trees (but missing scikit-learn). |
| environment.yml | Adds conda dependencies for the new functionality (but missing scikit-learn). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Introduces a new `reVeal learn-weights` pipeline step that trains a PUExtraTrees model on a normalized grid using point labels as positive samples. Derives feature importance weights for use with `score-weighted`. - Add reVeal/pu/ — PUExtraTree/PUExtraTrees adapted from https://github.com/jonathanwilton/PUExtraTrees, with modifications from /projects/largeload/.../models/PUExtraTrees/ (deterministic seeding, joblib parallelism) - Add reVeal/learn_weights.py — data preparation logic adapted from /projects/largeload/.../models/prepare_data.py (DataHandler), training and metrics from /projects/largeload/.../models/models.py (ModelTrainer) - Add Optuna-based hyperparameter tuning for class_prior — adapted from ModelTrainer.tune_hyperparameters() and ModelTrainer.objective() in /projects/largeload/.../models/models.py - Add CLI command (reVeal/cli/learn_weights.py), config (reVeal/config/learn_weights.py), and tests — new code following existing reVeal CLI patterns (score_weighted, normalize) - Add joblib, scipy, optuna to pyproject.toml and environment.yml
66ba519 to
687f961
Compare
6d2e29f to
a602ddc
Compare
a602ddc to
2374e7b
Compare
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (64.18%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #55 +/- ##
==========================================
- Coverage 84.50% 81.62% -2.89%
==========================================
Files 17 20 +3
Lines 1304 1518 +214
Branches 180 200 +20
==========================================
+ Hits 1102 1239 +137
- Misses 163 236 +73
- Partials 39 43 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ppinchuk
left a comment
There was a problem hiding this comment.
Thanks for adding tests and keeping the environment lockfile updated!
Introduces a new
reVeal learn-weightspipeline step that trains a PUExtraTrees model on a normalized grid using point labels as positive samples. Derives feature importance weights for use withscore-weighted.Tested by running the new CLI tool like so
You can use the learned_weights_2026-05-19_agg64/ directory to test this code.
Next Steps:
This works well to generate the score weights. However, in practice there many of these features are highly correlated or we want to manually exclude for other purposes. We want a few feature engineering tools that will allow users to iterate on the score outputs. This will include: