Skip to content

GH-16757: Fix top_n_features in shap_summary_plot in R#16790

Open
tomasfryda wants to merge 1 commit intorel-3.46.0from
tomf_GH-16757_shap_summary_plot_top_n_features
Open

GH-16757: Fix top_n_features in shap_summary_plot in R#16790
tomasfryda wants to merge 1 commit intorel-3.46.0from
tomf_GH-16757_shap_summary_plot_top_n_features

Conversation

@tomasfryda
Copy link
Copy Markdown
Contributor

Copilot AI review requested due to automatic review settings March 26, 2026 13:24
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an off-by-one issue in the R h2o.shap_summary_plot feature-selection logic so top_n_features renders exactly the requested number of features, and adds regression tests in both R and Python.

Changes:

  • Fix top_n_features slicing in h2o.shap_summary_plot (R) and drop unused factor levels before selecting features.
  • Add an R unit test covering typical and corner-case top_n_features values (including 0 erroring and -1 meaning “all”).
  • Add a Python unit test intended to validate top_n_features behavior via plotted y-axis labels.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
h2o-r/tests/testdir_misc/explain/runit_shap_summary_plot_top_n_features.R Adds R regression tests for top_n_features sizing and corner cases.
h2o-r/h2o-package/R/explain.R Fixes off-by-one feature selection and adds top_n_features == 0 validation.
h2o-py/tests/testdir_misc/explain/pyunit_shap_summary_plot_top_n_features.py Adds Python regression test that inspects rendered tick labels to count displayed features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread h2o-r/h2o-package/R/explain.R
Comment thread h2o-r/h2o-package/R/explain.R
@tomasfryda tomasfryda requested a review from maurever March 27, 2026 15:38
@tomasfryda tomasfryda self-assigned this Mar 27, 2026
@tomasfryda tomasfryda added this to the 3.46.0.11 milestone Mar 27, 2026
@tomasfryda tomasfryda requested a review from valenad1 April 14, 2026 07:29
ytick_labels_default = [t.get_text() for t in ax_default.get_yticklabels()]
n_default = len([l for l in ytick_labels_default if l.strip()])
matplotlib.pyplot.close()
assert n_default <= 20, f"Default top_n_features=20 showed {n_default} features"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the n_default is less than 20? Shouldn't it be exactly 20? Also, please save the value 20 as a variable defined from data.

Suggested change
assert n_default <= 20, f"Default top_n_features=20 showed {n_default} features"
assert n_default == 20, f"Default top_n_features=20 showed {n_default} features"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the n_default is less than 20?

yes

Shouldn't it be exactly 20?

no

Also, please save the value 20 as a variable defined from data.

This limit is not data dependent.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants