Skip to content

Fix parentless spawning.#7237

Open
hjoliver wants to merge 8 commits into
cylc:masterfrom
hjoliver:fix-parentless-spawning
Open

Fix parentless spawning.#7237
hjoliver wants to merge 8 commits into
cylc:masterfrom
hjoliver:fix-parentless-spawning

Conversation

@hjoliver

@hjoliver hjoliver commented Mar 17, 2026

Copy link
Copy Markdown
Member

Close #7235
Close #5730

Premature shutdown or stall can result if a parented instance of a sometimes parentless task ends up at the runahead limit.

See #7235 (comment)

The correct way to do this is: for any task that is parentless in one or more recurrences, always spawn the next parentless instance - which may lie beyond multiple parented instances and/or beyond the runahead limit.

In effect, a parentless instance should always spawn the next parentless instance, at runahead release time.


NOTE my first attempt at this bug fix ran into trouble because the task pool spawning logic has gradually become too complicated to follow easily, so I bit the bullet and tried to rethink it.

As a result: this PR is a significant simplification of the scheduler core:. E.g. calls to compute_runahead() in the code: ~10 down to ~1; and git diff master cylc/flow: 114 insertions(+), 218 deletions(-)

On master: every time anything happens to spawn a task, the task pool recomputes the runahead limit and spawns and releases and queues any parentless instances of that task all the way to the limit (which is recursive, within a single main loop iteration).

On this branch: the task pool only spawns the one instance, not downstream consequences of it. The main loop then computes the limit, releases instances below the limit, and (on release) spawns the single next parentless instance (if there is one). However, I do a single spawn-to-rh-limit at startup (not necessary, but many current integration tests expect that).

Note zero functional tests broke despite the many changes to the scheduler guts here.

The only consequences to be aware of are:

  • If the runahead limit suddenly jumps multiple cycles ahead, the workflow will spawn out to it at one cycle per main loop iteration, instead of immediately (this really doesn't matter, but I could put a single spawn-to-rh-limit in the main loop if desired)
  • Integration tests need to await schd._main_loop() and/or schd.pool.spawn_to_runahead_limit() before checking downstream consequences (i.e. beyond immediate spawn) of operations such as trigger and set (note this actually didn't break very many existing integration tests, and they were easily fixed)

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver hjoliver marked this pull request as draft March 18, 2026 01:11
@hjoliver hjoliver self-assigned this Mar 18, 2026
@hjoliver hjoliver added this to the 8.6.x milestone Mar 18, 2026
@hjoliver hjoliver added the bug Something is wrong :( label Mar 18, 2026
@hjoliver hjoliver force-pushed the fix-parentless-spawning branch 4 times, most recently from a485c53 to b39e758 Compare April 12, 2026 09:14
@hjoliver hjoliver force-pushed the fix-parentless-spawning branch from b39e758 to 0dcfc96 Compare April 12, 2026 20:50
@hjoliver hjoliver force-pushed the fix-parentless-spawning branch from 5b3b179 to fa6accf Compare April 12, 2026 22:52
@hjoliver hjoliver marked this pull request as ready for review April 12, 2026 23:40
@oliver-sanders

oliver-sanders commented Apr 14, 2026

Copy link
Copy Markdown
Member

Branch base is master, but milestone is 8.6.x.

This changes some quite fundamental stuff, so it might make sense to leave this on master?

@hjoliver

hjoliver commented Apr 15, 2026

Copy link
Copy Markdown
Member Author

Yeah, that's why I put it on master initially, but I'm not entirely sure - it still fixes an important bug and does not add any new features.

(And stating the obvious, but the fundamental stuff needed changing because over time it had become a right mess).

@oliver-sanders

Copy link
Copy Markdown
Member

Ok, will continue reviewing against master for now...

Comment thread cylc/flow/task_pool.py Outdated
Comment thread cylc/flow/scheduler.py

self.pool.compute_runahead()
self.pool.release_runahead_tasks()
await self.workflow_shutdown()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps take this opportunity to rename workflow_shutdown to e.g. set_stop_mode

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we punt that as off-topic? From a quick look it doesn't just set the stop mode, it might also shut the scheduler down.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing that we would attempt shutdown so soon after the start of main loop... Could you explain why this has been moved here?

At the least, I think a comment would be useful:

Suggested change
await self.workflow_shutdown()
# If applicable, set stop mode or shutdown on task failure:
await self.workflow_shutdown()

Comment thread cylc/flow/scheduler.py Outdated
Comment thread cylc/flow/xtrigger_mgr.py
Comment thread cylc/flow/taskdef.py Outdated
Comment thread cylc/flow/task_pool.py Outdated
Comment thread changes.d/7237.fix.md Outdated
Comment thread cylc/flow/taskdef.py Outdated
Comment thread cylc/flow/task_pool.py Outdated
Comment thread cylc/flow/taskdef.py Outdated
@MetRonnie MetRonnie modified the milestones: 8.6.x, 8.7.0 May 20, 2026
@MetRonnie MetRonnie self-requested a review May 20, 2026 13:36
Comment thread cylc/flow/scheduler.py Outdated
@MetRonnie MetRonnie self-requested a review May 20, 2026 13:53
hjoliver and others added 2 commits June 8, 2026 16:15
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
hjoliver and others added 2 commits June 8, 2026 16:24
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
@hjoliver

hjoliver commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

All review comments addressed @MetRonnie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :(

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mixed parentless/non-parentless task cause premature shutdown Stall when task has parents in some cycles but is parentless in others

3 participants