Skip to content

VOTE SLEP 23: Callback API#103

Open
jeremiedbb wants to merge 1 commit intoscikit-learn:mainfrom
jeremiedbb:vote-slep023-callbacks
Open

VOTE SLEP 23: Callback API#103
jeremiedbb wants to merge 1 commit intoscikit-learn:mainfrom
jeremiedbb:vote-slep023-callbacks

Conversation

@jeremiedbb
Copy link
Copy Markdown
Member

This PR collects the votes for SLEP023: Callback API.

This SLEP is being implemented in the callbacks feature branch. scikit-learn/scikit-learn#33322 keeps and updated diff against main. It currently contains the framework and a callback for ProgressBars.
On top of that, there's ongoing work to:

According to our governance model, the vote will be open for a month (untill April 20), and the motion is accepted if 2/3 of the cast votes are in favor.

@scikit-learn/core-devs

Copy link
Copy Markdown
Member

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! 🚀

@lorentzenchr
Copy link
Copy Markdown
Member

@jeremiedbb Before I vote, I have 4 remaining questions:

  1. Progress bars
    The SLEP gives the example

    callback = ProgressBar()
    clf = LogisticRegression()
    clf.set_callbacks(callback)
    clf.fit(X, y)

    What will the progess bar show? Linear models do not know the number iterations in advance. They stop iterating if a stopping criterion reaches a small tolerance or if max_iter is reached. In that case they did not converge (=bad).
    It is clear that for other estimators like random forests, gradient boosted trees or grid search, the number of iterations is fixed and progress bars are most natural to add.

  2. Early stopping
    What happens if there is nothing to stop, no iteration? For example

    LinearRegression().set_callbacks(EarlyStopping())
    FixedThresholdClassifier().set_callbacks(EarlyStopping())
  3. Performance
    The SLEP mentions a possible performance regression. If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?
    The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?

  4. Downstream packages
    What is the impact on downstream packages? What does "scikit-learn compatible estimator" mean? Does it include callbacks?
    One example is LightGBM, they already have callbacks, see https://lightgbm.readthedocs.io/en/stable/Python-Intro.html#early-stopping.

@jeremiedbb
Copy link
Copy Markdown
Member Author

jeremiedbb commented Mar 22, 2026

@lorentzenchr

  1. Progressbar

What will the progess bar show? Linear models do not know the number iterations in advance.

The maximum number of iterations is known in advance. So what you see is progress toward that maximum number of iterations. When the estimator stops after reaching it's convergence criterion, the progress bar is directly marked as 100% finished (jumping from it's current completion to 100%)

There may be estimators where the maximum number of iterations itself is not known in advance. In that case the progress bar is displayed as an indeterminate progress bar.

  1. EarlyStopping

What happens if there is nothing to stop, no iteration?

Then the early stopping callback has no effect when set on such estimators. The implementation of callback support in estimators is such that some iteration loops are interruptable and the end of loop steps. When there are no such loops, there's nothing to interrupt.

  1. Performance

If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?

Callback support in estimators consists in 2 things: creating callback contexts and calling callback hooks

  • calling callbacks hooks when there's no callback is just a function call so much less than 1us. This is clearly negligible
  • the callback context is not a big object: < 10 attributes that are ints, str, or references to objects. Creating a context takes around 1-2us. If I take the example of a logistic regression with 1000 iterations, it represents ~2ms.

Of course that overhead takes a more important proportion when fitting on very small datasets but I believe that 2us per iteration is very small, even for small datasets.

The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?

I expect that we implement callback support in some cython parts, in particular when the full iterative loop is cython (e.g. plain_sgd). In that case, to not degrade performances at all, we'll make sure that we don't acquire the gil and don't create these contexts if no callback is registered on the estimator. (That's why propagated callbacks like progress bars have a parameter to say how deeply you want to propagate them for instance)

  1. Downstream packages

What is the impact on downstream packages?

They can add support for callbacks in their estimators or not. If they don't, using their estimators in our meta-estimators won't break. It's just that progress bars for instance will only display progress on the meta-estimator. On the other hand using our estimator in their meta-estimators won't break either. In that case callbacks will work but won't be able to provide their full capabilities.

What does "scikit-learn compatible estimator" mean? Does it include callbacks?

I don't think that we're making callbacks mandatory to be named "compatible". We're making sure that composing elements that support callbacks and elements that don't doesn't crash. At the same time we are going to include a developer test suite in the callback module for third party devs to test their custom estimators and/or custom callbacks to make sure that they are compatible with the scikit-learn callback api.

One example is LightGBM, they already have callbacks, see https://lightgbm.readthedocs.io/en/stable/Python-Intro.html#early-stopping.

estimators that support the scikit-learn callback api won't automatically support lightgbm (or lightning or tensorflow, ...) callbacks. Whether or not they'll add support for scikit-learn callbacks in their estimators, I can't tell. But if they show interest, we can help them implement that.

Hope that I answered your questions. Don't hesitate to ask again if I wasn't clear enough in some of my answers :)

Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this SLEP and associated work, @jeremiedbb et al.!

@jeremiedbb jeremiedbb moved this to In progress in Labs Mar 23, 2026
@jeremiedbb jeremiedbb added this to Labs Mar 23, 2026
@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Mar 24, 2026

For the record, I think the current design looks good, and I am confident that it strikes a good balance between simplicity and expressive power, but I want to wait a bit to follow and review some of the linked PRs before casting my vote in a week or two.

@lorentzenchr
Copy link
Copy Markdown
Member

@jeremiedbb Thanks for your detailed and yet concise answers. I don't think there is a show stopper. Still, I'm interested so I ask further, mostly about implementation details.

  1. Performance

If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?

Callback support in estimators consists in 2 things: creating callback contexts and calling callback hooks

  • calling callbacks hooks when there's no callback is just a function call so much less than 1us. This is clearly negligible

  • the callback context is not a big object: < 10 attributes that are ints, str, or references to objects. Creating a context takes around 1-2us. If I take the example of a logistic regression with 1000 iterations, it represents ~2ms.

Of course that overhead takes a more important proportion when fitting on very small datasets but I believe that 2us per iteration is very small, even for small datasets.

The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?

I expect that we implement callback support in some cython parts, in particular when the full iterative loop is cython (e.g. plain_sgd). In that case, to not degrade performances at all, we'll make sure that we don't acquire the gil and don't create these contexts if no callback is registered on the estimator. (That's why propagated callbacks like progress bars have a parameter to say how deeply you want to propagate them for instance)

Would it make sense to apply the logic "don't create a callback context if no callback is registered" to all cases not just Cython code?
You already mentioned plain_sgd, the loops reads

with nogil:
    for epoch in range(max_iter):
        for i in range(n_samples):

I guess the callback would be registered inside the epoch/max_iter loop but outside the i/n_samples loop. How do avoid to acquire the gil again (which is is done for the validation_score_cb)?

@jeremiedbb
Copy link
Copy Markdown
Member Author

jeremiedbb commented Mar 26, 2026

@lorentzenchr

Would it make sense to apply the logic "don't create a callback context if no callback is registered" to all cases not just Cython code?

It depends:

  • for regular estimators (i.e. not meta-estimators), yes.
  • for meta-estimators, no, because their sub-estimators can themselves have callbacks registered on them so they need to be aware of the whole context tree to provide full capacity.

I was going to still create it for every kind of estimator to not make the callback support more verbose, adding branches everywhere, but I think that with a little bit of magic we can do it seamlessly without adding verbosity. I will open a dedicated issue to figure out which option we chose. This is kind of an implementation detail though and can be decided later.

I guess the callback would be registered inside the epoch/max_iter loop but outside the i/n_samples loop. How do avoid to acquire the gil again (which is is done for the validation_score_cb)?

plain_sgd will have a new callback_context argument. I would first add a check at the very beginning, before the with nogil and then acquire the gil conditionally:

cdef int has_callbacks = len(callback_context._callbacks) > 0
with nogil:
    for epoch in range(max_iter):
        if has_callbacks:
            with gil:
                subcontext = callback_context.subcontext(...)
                subcontext.call_on_fit_task_begin(...)

        for i in range(n_samples):
            <no callback stuff here cause we don't go deeper than iterations involving the full dataset>

        if has_callbacks:
            with gil:
                subcontext.call_on_fit_task_end(...)

(I edited the snippet to make it more realistic and accurate)

Copy link
Copy Markdown
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@betatim
Copy link
Copy Markdown
Member

betatim commented Apr 1, 2026

I've posted a few questions in scikit-learn/scikit-learn#33322 - it felt like the right place

@betatim
Copy link
Copy Markdown
Member

betatim commented Apr 1, 2026

One question in addition is about the callback context manager. We have something like the following:

class MyEstimator:
  ...
  @with_callbacks
  def fit(self, X, y):
    callback_ctx = self._init_callback_context(max_subtasks=self.max_iter)

For builtin estimators the with_callbacks decorator is hidden away in an existing decorator but it also exists.

What I am wondering is why we have this duplication. Naively it seems something like the following should also work:

class MyEstimator:
  ...
  def fit(self, X, y):
    with CallbackContext(max_subtasks=self.max_iter) as callback_ctx:
      # do stuff with callback_ctx

What am I missing?

@jeremiedbb
Copy link
Copy Markdown
Member Author

What am I missing?

1 level of indentation. In scikit-learn it means in particular that PRs implementing callback support would have a huge unrelated diff because of the new indentation of the whole fit method.

The long term preferred solution is the public vs private fit, where the public fit does the boilerplate and calls the private fit that is only about the optimization problem. It's orthogonal to the callbacks though and in the mean time the decorator is the least disruptive.

Note that we can still offer the context manager option if we get the request.

@jeremiedbb
Copy link
Copy Markdown
Member Author

Hi everyone,

I opened a PR (#104) to change a small detail in the SLEP because we found out a performance concern that we wanted to tackle right away. The goal is to avoid unnecessary pickling of estimators which implies to add 1 extra arg to the callback hooks.

I know that it's not the best practice to change the SLEP during the voting time but I believe that it's a very minor change that should not impact the decision. Please tell me if you think otherwise.

(note that the PR #104 can be merged after the vote as a small amendment to the slep if you think that it's more appropriate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.