Replace HPO framework NNI with Optuna and update CLI#852
Merged
Conversation
Add comprehensive CLI commands enabling users to operate all PyPOTS functionalities from the command line: New commands: - train: Train models from YAML/JSON config files with CLI overrides - predict: Run inference with saved .pypots models using config for correct model architecture reconstruction - evaluate: Evaluate predictions against ground truth with task-specific metrics (MSE, MAE, RMSE, accuracy, F1, etc.) - data: Convert/split/describe datasets (H5, CSV, NumPy, Pickle) - model: List/describe/inspect/generate-config for 100+ models across 6 task types - tune: Improved HPO wrapper with NNI integration and config files - info: Show environment, version, device, and model count information - benchmark: Compare multiple models on the same dataset with metrics Architecture: - Config-first design: YAML/JSON configs as primary input, CLI args override - Lazy model imports via importlib for fast CLI startup - Dynamic model registry using __all__ from each task module - Parameter filtering via inspect.signature for safe model instantiation - All commands extend existing BaseCommand ABC pattern Also adds unit tests for all 8 commands (16 test cases total). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the deprecated Microsoft NNI framework with Optuna for all hyperparameter optimization functionality. Key changes: - pypots/base.py: Remove NNI import/reporting, add optional optuna_trial parameter to BaseNNModel for in-training pruning support - pypots/cli/tune.py: Complete rewrite using Optuna study.optimize() with in-process objective function (no more separate trial processes) - pypots/cli/hpo.py: Removed (NNI trial runner no longer needed) - 3 model files (usgan, vader, crli): Replace NNI reporting with Optuna trial.report() + should_prune() pattern - requirements.txt: Add optuna dependency - Docs: Update NNI references to Optuna in README.md, README_zh.md, docs/index.rst New Optuna config format uses int/float/categorical search space types with low/high/choices params, and supports TPE/Random/CmaEs/Grid samplers plus MedianPruner/PercentilePruner/HyperbandPruner pruners. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 'data list' action to list 260+ benchmark datasets from TSDB - Add 'data load' action to download, preprocess, and save benchmark datasets as train/val/test H5 splits via benchpots - Support dataset-specific params: --subset, --rate, --n_steps, --pattern - Use inspect.signature() to filter kwargs for each preprocess function - Add tests for list and load actions (physionet_2012) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace all 11 command files from argparse class-based pattern to Click decorators (@click.command, @click.group, @click.option) - Convert data and model commands to Click groups with subcommands - Remove BaseCommand ABC class; keep execute_command() and check_if_under_root_dir() as module-level functions in base.py - Rename merge_config_with_args to merge_config_with_overrides (takes dict) - Rewrite pypots_cli.py entry point as Click group with cli.add_command() - Update all 11 test files to use Click's CliRunner - Add 'click' to requirements.txt - Fix info.py: replace NNI with Optuna in optional dependencies list - Net reduction: -1075 lines (from 3007 removed, 1932 added) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. CLI startup: ~7s → <100ms (97x faster)
- Make pypots/__init__.py use lazy imports via __getattr__
- Implement LazyGroup in pypots_cli.py for deferred command loading
- Move heavy imports (torch, numpy, transformers, tsdb) from module
level to inside command functions across all 11 CLI modules
2. HPO reproducibility: Reset random seed before each Optuna trial
- Call set_random_seed(seed) at the start of each trial's objective()
- Ensures identical hyperparams produce identical model initialization
3. Model file metadata: Enrich .pypots files with:
- model_class: The model class name (e.g., 'SAITS')
- hyperparameters: All JSON-serializable model constructor parameters
- save_timestamp: ISO 8601 timestamp of when the model was saved
- Version compatibility warnings on load when versions differ
- Enhanced 'model inspect' CLI command to display all metadata
- Full backward compatibility with older .pypots files
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s-pypots integration Add three key CLI capabilities that bridge the gap between ai4ts CSV data format and PyPOTS H5 model input format: - data prepare: Converts CSV files (with SAMPLE_ID, features, CLAF_TARGET) to PyPOTS-compatible H5 with proper 3D arrays (X, X_ori, y). Supports batch mode (--train/--val/--test) and single-file mode. Handles SAMPLE_ID grouping, artificial missing injection for val/test sets, and label extraction. - data describe: Enhanced to accept CSV files in addition to H5, with --json flag for machine-readable output. Shows n_samples, n_steps, n_features, missing_rate, labels, and per-feature missing rates. - recommend: New command that suggests model hyperparameters based on data properties. Accepts CSV or H5 files, auto-detects data dimensions, and generates ready-to-use YAML config files with appropriate hyperparameters for all 5 supported models (SAITS, TimesNet, TEFN, CRLI, TimeMixer). Updated pypots-tsa and ai4ts-skills with unified CLI-only workflow. Added 10 new tests (all passing). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tegration - Add 'data profile' subcommand: analyzes CSV datasets and outputs DataProfile JSON with sample statistics, schema mapping, timestamp info, and recommended windowing strategy - Add 'data reconstruct' subcommand: reverses windowing transformation using window registry to reconstruct original-shape data from model predictions (strips padding, reassembles by sample ID) - Enhance 'data prepare': integrates ai4ts pipeline for intelligent variable-length sample handling with automatic strategy selection (pad_only/direct/sliding_window) and window registry generation - Add CLI tests for profile, profile --json, prepare with registry, and end-to-end reconstruct (tests 10-13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ng_mask - Auto-compute indicating_mask when ground truth X_ori has NaN (natural missing) - Evaluate only on artificially masked positions (observed in X_ori, missing in X) - Replace NaN in targets with 0 at non-evaluated positions to pass metric assertions - Add informative log message showing number of evaluated vs excluded positions - Include test_set in recommend config auto-discovery (was missing train/val only) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er calls Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # README_zh.md
|
Collaborator
Coverage Report for CI Build 25324056397Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-0.3%) to 79.831%Details
Uncovered Changes
Coverage Regressions6 previously-covered lines in 2 files lost coverage.
Coverage Stats
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




What does this PR do?
Before submitting