VOTE SLEP 23: Callback API by jeremiedbb · Pull Request #103 · scikit-learn/enhancement_proposals

jeremiedbb · 2026-03-20T18:03:46Z

This PR collects the votes for SLEP023: Callback API.

This SLEP is being implemented in the callbacks feature branch. scikit-learn/scikit-learn#33322 keeps and updated diff against main. It currently contains the framework and a callback for ProgressBars.
On top of that, there's ongoing work to:

Add a callback to monitor a score during fit ([Callbacks] FEA Add the ScoringMonitor callback scikit-learn#33407). It's compatible with the already merged api but we still trying to figure out the best format for the logged score values.
Add developer documentation to implement callback support in scikit-learn and third party estimators ([Callbacks] DOC Add callbacks developer documentation scikit-learn#33423). It's still wip and we'll keep improving it in the coming weeks. Documentation for end users will also arrive soon.
Add callback support in SearchCV classes ([Callbacks] ENH Add callback support to BaseSearchCV scikit-learn#33533). We'll also draft PRs to add support in Pipeline and a couple of key estimators in the coming weeks.
Keep improving the inspection of the callback framework ([Callbacks] Repr for the CallbackContext scikit-learn#33591). It will be useful not only for testing and debugging but for end users as well.

According to our governance model, the vote will be open for a month (untill April 20), and the motion is accepted if 2/3 of the cast votes are in favor.

@scikit-learn/core-devs

StefanieSenger

This is awesome! 🚀

lorentzenchr · 2026-03-21T17:11:41Z

@jeremiedbb Before I vote, I have 4 remaining questions:

Progress bars
The SLEP gives the example
```
callback = ProgressBar()
clf = LogisticRegression()
clf.set_callbacks(callback)
clf.fit(X, y)
```
What will the progess bar show? Linear models do not know the number iterations in advance. They stop iterating if a stopping criterion reaches a small tolerance or if max_iter is reached. In that case they did not converge (=bad).
It is clear that for other estimators like random forests, gradient boosted trees or grid search, the number of iterations is fixed and progress bars are most natural to add.

Early stopping
What happens if there is nothing to stop, no iteration? For example

LinearRegression().set_callbacks(EarlyStopping())
FixedThresholdClassifier().set_callbacks(EarlyStopping())

Performance
The SLEP mentions a possible performance regression. If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?
The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?
Downstream packages
What is the impact on downstream packages? What does "scikit-learn compatible estimator" mean? Does it include callbacks?
One example is LightGBM, they already have callbacks, see https://lightgbm.readthedocs.io/en/stable/Python-Intro.html#early-stopping.

jeremiedbb · 2026-03-22T21:01:08Z

@lorentzenchr

Progressbar

What will the progess bar show? Linear models do not know the number iterations in advance.

The maximum number of iterations is known in advance. So what you see is progress toward that maximum number of iterations. When the estimator stops after reaching it's convergence criterion, the progress bar is directly marked as 100% finished (jumping from it's current completion to 100%)

There may be estimators where the maximum number of iterations itself is not known in advance. In that case the progress bar is displayed as an indeterminate progress bar.

EarlyStopping

What happens if there is nothing to stop, no iteration?

Then the early stopping callback has no effect when set on such estimators. The implementation of callback support in estimators is such that some iteration loops are interruptable and the end of loop steps. When there are no such loops, there's nothing to interrupt.

Performance

If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?

Callback support in estimators consists in 2 things: creating callback contexts and calling callback hooks

calling callbacks hooks when there's no callback is just a function call so much less than 1us. This is clearly negligible
the callback context is not a big object: < 10 attributes that are ints, str, or references to objects. Creating a context takes around 1-2us. If I take the example of a logistic regression with 1000 iterations, it represents ~2ms.

Of course that overhead takes a more important proportion when fitting on very small datasets but I believe that 2us per iteration is very small, even for small datasets.

The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?

I expect that we implement callback support in some cython parts, in particular when the full iterative loop is cython (e.g. plain_sgd). In that case, to not degrade performances at all, we'll make sure that we don't acquire the gil and don't create these contexts if no callback is registered on the estimator. (That's why propagated callbacks like progress bars have a parameter to say how deeply you want to propagate them for instance)

Downstream packages

What is the impact on downstream packages?

They can add support for callbacks in their estimators or not. If they don't, using their estimators in our meta-estimators won't break. It's just that progress bars for instance will only display progress on the meta-estimator. On the other hand using our estimator in their meta-estimators won't break either. In that case callbacks will work but won't be able to provide their full capabilities.

What does "scikit-learn compatible estimator" mean? Does it include callbacks?

I don't think that we're making callbacks mandatory to be named "compatible". We're making sure that composing elements that support callbacks and elements that don't doesn't crash. At the same time we are going to include a developer test suite in the callback module for third party devs to test their custom estimators and/or custom callbacks to make sure that they are compatible with the scikit-learn callback api.

One example is LightGBM, they already have callbacks, see https://lightgbm.readthedocs.io/en/stable/Python-Intro.html#early-stopping.

estimators that support the scikit-learn callback api won't automatically support lightgbm (or lightning or tensorflow, ...) callbacks. Whether or not they'll add support for scikit-learn callbacks in their estimators, I can't tell. But if they show interest, we can help them implement that.

Hope that I answered your questions. Don't hesitate to ask again if I wasn't clear enough in some of my answers :)

jjerphan

Thank you for this SLEP and associated work, @jeremiedbb et al.!

ogrisel · 2026-03-24T10:09:48Z

For the record, I think the current design looks good, and I am confident that it strikes a good balance between simplicity and expressive power, but I want to wait a bit to follow and review some of the linked PRs before casting my vote in a week or two.

lorentzenchr · 2026-03-26T17:14:25Z

@jeremiedbb Thanks for your detailed and yet concise answers. I don't think there is a show stopper. Still, I'm interested so I ask further, mostly about implementation details.

Performance

If no callbacks are registered, the performance regression should be negligible. Is this assumption correct?

Callback support in estimators consists in 2 things: creating callback contexts and calling callback hooks

calling callbacks hooks when there's no callback is just a function call so much less than 1us. This is clearly negligible

the callback context is not a big object: < 10 attributes that are ints, str, or references to objects. Creating a context takes around 1-2us. If I take the example of a logistic regression with 1000 iterations, it represents ~2ms.

Of course that overhead takes a more important proportion when fitting on very small datasets but I believe that 2us per iteration is very small, even for small datasets.

The SLEP also mentions compiled code/Cython. Are callbacks expected to be implemented in the compiled part of the code base or do they stay within Python land?

I expect that we implement callback support in some cython parts, in particular when the full iterative loop is cython (e.g. plain_sgd). In that case, to not degrade performances at all, we'll make sure that we don't acquire the gil and don't create these contexts if no callback is registered on the estimator. (That's why propagated callbacks like progress bars have a parameter to say how deeply you want to propagate them for instance)

Would it make sense to apply the logic "don't create a callback context if no callback is registered" to all cases not just Cython code?
You already mentioned plain_sgd, the loops reads

with nogil:
    for epoch in range(max_iter):
        for i in range(n_samples):

I guess the callback would be registered inside the epoch/max_iter loop but outside the i/n_samples loop. How do avoid to acquire the gil again (which is is done for the validation_score_cb)?

jeremiedbb · 2026-03-26T17:29:56Z

@lorentzenchr

Would it make sense to apply the logic "don't create a callback context if no callback is registered" to all cases not just Cython code?

It depends:

for regular estimators (i.e. not meta-estimators), yes.
for meta-estimators, no, because their sub-estimators can themselves have callbacks registered on them so they need to be aware of the whole context tree to provide full capacity.

I was going to still create it for every kind of estimator to not make the callback support more verbose, adding branches everywhere, but I think that with a little bit of magic we can do it seamlessly without adding verbosity. I will open a dedicated issue to figure out which option we chose. This is kind of an implementation detail though and can be decided later.

I guess the callback would be registered inside the epoch/max_iter loop but outside the i/n_samples loop. How do avoid to acquire the gil again (which is is done for the validation_score_cb)?

plain_sgd will have a new callback_context argument. I would first add a check at the very beginning, before the with nogil and then acquire the gil conditionally:

cdef int has_callbacks = len(callback_context._callbacks) > 0
with nogil:
    for epoch in range(max_iter):
        if has_callbacks:
            with gil:
                subcontext = callback_context.subcontext(...)
                subcontext.call_on_fit_task_begin(...)

        for i in range(n_samples):
            <no callback stuff here cause we don't go deeper than iterations involving the full dataset>

        if has_callbacks:
            with gil:
                subcontext.call_on_fit_task_end(...)

(I edited the snippet to make it more realistic and accurate)

lorentzenchr

+1

betatim · 2026-04-01T16:08:21Z

I've posted a few questions in scikit-learn/scikit-learn#33322 - it felt like the right place

betatim · 2026-04-01T16:13:18Z

One question in addition is about the callback context manager. We have something like the following:

class MyEstimator:
  ...
  @with_callbacks
  def fit(self, X, y):
    callback_ctx = self._init_callback_context(max_subtasks=self.max_iter)

For builtin estimators the with_callbacks decorator is hidden away in an existing decorator but it also exists.

What I am wondering is why we have this duplication. Naively it seems something like the following should also work:

class MyEstimator:
  ...
  def fit(self, X, y):
    with CallbackContext(max_subtasks=self.max_iter) as callback_ctx:
      # do stuff with callback_ctx

What am I missing?

jeremiedbb · 2026-04-01T16:21:58Z

What am I missing?

1 level of indentation. In scikit-learn it means in particular that PRs implementing callback support would have a huge unrelated diff because of the new indentation of the whole fit method.

The long term preferred solution is the public vs private fit, where the public fit does the boilerplate and calls the private fit that is only about the optimization problem. It's orthogonal to the callbacks though and in the mean time the decorator is the least disruptive.

Note that we can still offer the context manager option if we get the request.

jeremiedbb · 2026-04-03T13:18:47Z

Hi everyone,

I opened a PR (#104) to change a small detail in the SLEP because we found out a performance concern that we wanted to tackle right away. The goal is to avoid unnecessary pickling of estimators which implies to add 1 extra arg to the callback hooks.

I know that it's not the best practice to change the SLEP during the voting time but I believe that it's a very minor change that should not impact the decision. Please tell me if you think otherwise.

(note that the PR #104 can be merged after the vote as a small amendment to the slep if you think that it's more appropriate)

vote slep023: callbacks

e4291c4

thomasjpfan approved these changes Mar 20, 2026

View reviewed changes

StefanieSenger approved these changes Mar 20, 2026

View reviewed changes

adrinjalali approved these changes Mar 23, 2026

View reviewed changes

jjerphan approved these changes Mar 23, 2026

View reviewed changes

OmarManzoor approved these changes Mar 23, 2026

View reviewed changes

jeremiedbb moved this to In progress in Labs Mar 23, 2026

jeremiedbb added this to Labs Mar 23, 2026

adam2392 approved these changes Mar 24, 2026

View reviewed changes

lorentzenchr approved these changes Mar 26, 2026

View reviewed changes

virchan approved these changes Mar 31, 2026

View reviewed changes

glemaitre approved these changes Apr 9, 2026

View reviewed changes

GaelVaroquaux approved these changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VOTE SLEP 23: Callback API#103

VOTE SLEP 23: Callback API#103
jeremiedbb wants to merge 1 commit intoscikit-learn:mainfrom
jeremiedbb:vote-slep023-callbacks

jeremiedbb commented Mar 20, 2026

Uh oh!

StefanieSenger left a comment

Uh oh!

lorentzenchr commented Mar 21, 2026

Uh oh!

jeremiedbb commented Mar 22, 2026 •

edited

Loading

Uh oh!

jjerphan left a comment

Uh oh!

ogrisel commented Mar 24, 2026

Uh oh!

lorentzenchr commented Mar 26, 2026

Uh oh!

jeremiedbb commented Mar 26, 2026 •

edited

Loading

Uh oh!

lorentzenchr left a comment

Uh oh!

betatim commented Apr 1, 2026

Uh oh!

betatim commented Apr 1, 2026

Uh oh!

jeremiedbb commented Apr 1, 2026

Uh oh!

jeremiedbb commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Uh oh!

Conversation

jeremiedbb commented Mar 20, 2026

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Mar 21, 2026

Uh oh!

jeremiedbb commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 24, 2026

Uh oh!

lorentzenchr commented Mar 26, 2026

Uh oh!

jeremiedbb commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Apr 1, 2026

Uh oh!

betatim commented Apr 1, 2026

Uh oh!

jeremiedbb commented Apr 1, 2026

Uh oh!

jeremiedbb commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

jeremiedbb commented Mar 22, 2026 •

edited

Loading

jeremiedbb commented Mar 26, 2026 •

edited

Loading