Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization#103
Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization#103m9o8 wants to merge 13 commits into4Freye:mainfrom
Conversation
… and add supporting utilities and documentation, especially group plotting
There was a problem hiding this comment.
Code Review
This pull request introduces spatio-temporal cross-validation capabilities to the PanelSplit class, enabling users to perform holdouts based on both time periods and spatial groups. Key updates include the addition of groups and group_splitter parameters, logic to intersect temporal and spatial splits, and enhanced plotting functionality to visualize grouped data. Review feedback suggests optimizing index intersection logic using np.intersect1d for better performance and updating the plotting utility to support X and y parameters to prevent errors with certain scikit-learn splitters.
There was a problem hiding this comment.
Pull request overview
This PR adds first-class spatio-temporal cross-validation support to PanelSplit by combining temporal TimeSeriesSplit cuts with scikit-learn-style group splitters, plus updates documentation/tests and enhances split visualization to handle grouped (spatial) scenarios.
Changes:
- Extend
PanelSplitwithgroups+group_splitterand compute combined spatio-temporal folds (with lazyX/ysupport for stratified group splitters). - Add
check_groups()utility to normalize multi-column group inputs into a 1D composite identifier. - Update
plot_splits()to render grouped spatio-temporal folds as multiple subplots; add README + notebook examples and new spatial CV tests.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
panelsplit/cross_validation.py |
Adds groups/group_splitter to generate spatio-temporal splits and lazy runtime splitting with X/y. |
panelsplit/utils/validation.py |
Adds check_groups() to flatten/normalize group inputs, including multi-column inputs. |
panelsplit/plot.py |
Adds grouped subplot visualization for spatio-temporal folds and updates plotting API. |
tests/test_spatial_cv.py |
Introduces tests for grouped and stratified grouped splitting behavior. |
README.md |
Documents spatio-temporal cross-validation usage with group_splitter and lazy X/y splitting. |
examples/An introduction to PanelSplit.ipynb |
Adds a spatio-temporal splitting example and updated outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Determin indices outside the loop once Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…panelsplit into ENH-spatio-temporal-splits
…cific splitters requiring X and y to stratify
… removed redundant function parameter
Corrected usage of use groups Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Thanks for this issue suggestion and PR, Moritz! I will look at this in the next few days and get back to you soon. |
Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization
Description
This PR resolves #102 and introduces Spatio-Temporal Cross-Validation architectures into
PanelSplit, moving away from statically hard-coded spatial limits to enable generic, lazy-evaluated integration natively withscikit-learnsplitters. Additionally, the plotting framework has been updated.Core Changes
group_splitterParameter: Replaced static constraints (n_group_splits) by allowing natively compatiblescikit-learnsplitters (e.g.,GroupKFold,StratifiedGroupKFold) directly inPanelSplitconfigurations.X/yDependencies Support: UpgradedPanelSplit.split(X, y)to lazily support evaluating target geometries dynamically at splitting time, which is fundamentally required for stratified methodologies enforcing bounds upon labels. Pre-calculation ininitwouldn't work here.narwhals: Introduced a flattening mechanism (utils/validation.py), collapsing multi-dimensional spatial inputs (e.g., nestedgroups=df[["state", "city"]]) into geometric hashes.plot.py):plot_spatiotemporal_splits(max_groups=2).Why this matters
These enhancements enable secure, seamless testing of true "cold-start" entity validation, replicating realistic inference environments where predicting prospective timelines is applied explicitly to unobserved locations/entities (guaranteeing strict isolation from unseen spatial leakage correlations).
Verification
StratifiedGroupKFold.uv run pytest.