Skip to content

GH-16779: update estimators to support sklearn 1.6+#16780

Open
zazulam wants to merge 1 commit intoh2oai:masterfrom
zazulam:sklearn-update
Open

GH-16779: update estimators to support sklearn 1.6+#16780
zazulam wants to merge 1 commit intoh2oai:masterfrom
zazulam:sklearn-update

Conversation

@zazulam
Copy link
Copy Markdown

@zazulam zazulam commented Mar 10, 2026

Fixes #16779

Summary

h2o.sklearn wrappers need compatibility updates for newer scikit-learn APIs, especially around estimator type semantics and tags behavior in scikit-learn 1.6+. scikit-learn revamped estimator tags in 1.6.0 (December 2024) and introduced __sklearn_tags__ as the preferred API (with Tags objects). This affects third-party estimator wrappers and type-dispatch behavior (is_classifier, is_regressor, clone/check tooling).

For generic H2O sklearn wrappers (e.g. H2OGradientBoostingEstimator(estimator_type='classifier')), semantics can drift in sklearn integration paths (notably clone + tags/type checks) unless wrapper params and _estimator_type are propagated consistently.

Proposed / implemented fix

In h2o-py/h2o/sklearn/wrapper.py:

  • Keep explicit _estimator_type precedence in classifier/regressor checks.
  • Include estimator_type and init_connection_args in sklearn parameter flow (get_params/set_params) so clone semantics are preserved.
  • Keep these params reserved from forwarding to underlying H2O estimators.
  • Implement __sklearn_tags__ handling for newer sklearn tags API when available.

In h2o-py/tests/testdir_sklearn/pyunit_sklearn_api.py:

  • Add regression tests for:
    • classifier/regressor identification
    • clone preserving estimator_type semantics
    • sklearn tags estimator_type behavior (when available)

Validation run

  • testdir_sklearn/pyunit_sklearn_api.py → PASS
  • testdir_sklearn/pyunit_sklearn_params.py → PASS

Dependency notes

  • No H2O package dependency changes are required for this fix itself.
  • Compatibility validation was performed with latest scikit-learn 1.x available in test env: 1.6.1.
  • Local environment note: running older sklearn wheels with incompatible NumPy can cause ABI errors; this was an environment concern, not a required project dependency change.

Signed-off-by: zazulam <m.zazula@gmail.com>
@shania-m
Copy link
Copy Markdown

@maurever can we get more eyes on this?

@sonarqubecloud
Copy link
Copy Markdown

@zazulam
Copy link
Copy Markdown
Author

zazulam commented Mar 31, 2026

@valenad1 any chance this can get some reviews?

@shania-m
Copy link
Copy Markdown

@valenad1 this is a blocker for us would greatly appreciate the reviews thank you!

@zazulam
Copy link
Copy Markdown
Author

zazulam commented Apr 15, 2026

@maurever @valenad1 Sorry for being insistent, but we have teams that need this update so dependency upgrades can be vuln free.

@maurever
Copy link
Copy Markdown
Contributor

Hi @zazulam. I'm sorry we haven't been able to respond as quickly as you need. We are an open-source project with currently limited capacity to address community issues. We have registered your suggestion, and it is on our backlog, but unfortunately, we can't promise a specific timeline for when it will be resolved. Thank you for understanding.

@zazulam
Copy link
Copy Markdown
Author

zazulam commented Apr 16, 2026

@maurever I totally understand and appreciate all your of your team's efforts. Thank you for adding it to your backlog 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python sklearn wrappers: fix estimator_type/clone/tag compatibility for scikit-learn 1.6+

3 participants