Skip to content

Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization#103

Open
m9o8 wants to merge 13 commits into4Freye:mainfrom
m9o8:ENH-spatio-temporal-splits
Open

Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization#103
m9o8 wants to merge 13 commits into4Freye:mainfrom
m9o8:ENH-spatio-temporal-splits

Conversation

@m9o8
Copy link
Copy Markdown
Contributor

@m9o8 m9o8 commented Apr 5, 2026

Feature: Flexible Spatio-Temporal Splits and Dynamic Multi-Group Visualization

Description

This PR resolves #102 and introduces Spatio-Temporal Cross-Validation architectures into PanelSplit, moving away from statically hard-coded spatial limits to enable generic, lazy-evaluated integration natively with scikit-learn splitters. Additionally, the plotting framework has been updated.

Core Changes

  • Extensible group_splitter Parameter: Replaced static constraints (n_group_splits) by allowing natively compatible scikit-learn splitters (e.g., GroupKFold, StratifiedGroupKFold) directly in PanelSplit configurations.
  • Runtime X / y Dependencies Support: Upgraded PanelSplit.split(X, y) to lazily support evaluating target geometries dynamically at splitting time, which is fundamentally required for stratified methodologies enforcing bounds upon labels. Pre-calculation in init wouldn't work here.
  • Flattened Composites via narwhals: Introduced a flattening mechanism (utils/validation.py), collapsing multi-dimensional spatial inputs (e.g., nested groups=df[["state", "city"]]) into geometric hashes.
  • Enhanced Subplot Dispatching (plot.py):
    • Automatically filters and dispatches spatial group structures into functionally independent timeline grids, isolating pure chronological representations per group via plot_spatiotemporal_splits(max_groups=2).

Why this matters

These enhancements enable secure, seamless testing of true "cold-start" entity validation, replicating realistic inference environments where predicting prospective timelines is applied explicitly to unobserved locations/entities (guaranteeing strict isolation from unseen spatial leakage correlations).

Verification

  • Tested lazy target execution patterns automatically running explicitly bound evaluations across StratifiedGroupKFold.
  • Spatio-temporal mask intersections enforce combined boundary constraint timelines.
  • Passes all 130 native core validation scenarios securely backward compatible, checking uv run pytest.

Copilot AI review requested due to automatic review settings April 5, 2026 22:31
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces spatio-temporal cross-validation capabilities to the PanelSplit class, enabling users to perform holdouts based on both time periods and spatial groups. Key updates include the addition of groups and group_splitter parameters, logic to intersect temporal and spatial splits, and enhanced plotting functionality to visualize grouped data. Review feedback suggests optimizing index intersection logic using np.intersect1d for better performance and updating the plotting utility to support X and y parameters to prevent errors with certain scikit-learn splitters.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class spatio-temporal cross-validation support to PanelSplit by combining temporal TimeSeriesSplit cuts with scikit-learn-style group splitters, plus updates documentation/tests and enhances split visualization to handle grouped (spatial) scenarios.

Changes:

  • Extend PanelSplit with groups + group_splitter and compute combined spatio-temporal folds (with lazy X/y support for stratified group splitters).
  • Add check_groups() utility to normalize multi-column group inputs into a 1D composite identifier.
  • Update plot_splits() to render grouped spatio-temporal folds as multiple subplots; add README + notebook examples and new spatial CV tests.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
panelsplit/cross_validation.py Adds groups/group_splitter to generate spatio-temporal splits and lazy runtime splitting with X/y.
panelsplit/utils/validation.py Adds check_groups() to flatten/normalize group inputs, including multi-column inputs.
panelsplit/plot.py Adds grouped subplot visualization for spatio-temporal folds and updates plotting API.
tests/test_spatial_cv.py Introduces tests for grouped and stratified grouped splitting behavior.
README.md Documents spatio-temporal cross-validation usage with group_splitter and lazy X/y splitting.
examples/An introduction to PanelSplit.ipynb Adds a spatio-temporal splitting example and updated outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

m9o8 and others added 9 commits April 6, 2026 00:37
Determin indices outside the loop once

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…cific splitters requiring X and y to stratify
Corrected usage of use groups

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@4Freye
Copy link
Copy Markdown
Owner

4Freye commented Apr 7, 2026

Thanks for this issue suggestion and PR, Moritz! I will look at this in the next few days and get back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Spatio-Temporal Aware Cross-Validation for Panel Data

3 participants