Skip to content

data_store: optimise rtconfig serialisation#7313

Open
oliver-sanders wants to merge 1 commit into
cylc:8.6.xfrom
oliver-sanders:rtconfig-speedup
Open

data_store: optimise rtconfig serialisation#7313
oliver-sanders wants to merge 1 commit into
cylc:8.6.xfrom
oliver-sanders:rtconfig-speedup

Conversation

@oliver-sanders

@oliver-sanders oliver-sanders commented May 19, 2026

Copy link
Copy Markdown
Member

Partially addresses the example which demonstrated the severe performance issues outlined in #7267

Improve the efficiency of some parsec methods to improve the performance of datastore runtime configuration serialisation.

(deliberating over 8.6.x vs master for this one)

Example

[scheduler]
    [[events]]
        startup handlers = cylc set "%(workflow)s//1/a" && cylc stop "%(workflow)s"; echo

[task parameters]
    foo = 1..1000

[scheduling]
    [[graph]]
        R1 = a = > <foo> => b

[runtime]
    [[a, b]]
    [[<foo>]]
        execution retry delays = 1000*PT1S

This example is derived from the same workflow which prompted #7267 which includes complex execution/submission retry delay configurations.

This example stresses different factors to the one outlined there and does not use any absolute dependencies.

Problem

When the data store creates each <foo> task proxy, it must serialise the runtime configuration. This results in the PT1S expression being evaluated 10 million times!

  • 1000x: Once for each of the 1000 tasks.
  • 1000x: Once for each of the 1000 PT1S expressions.
  • 5x: Due to an inefficiency.
  • 2x: Due to the way parsec / configs work.

Fixes

There are three optimisations here:

  1. Fix the 5x inefficiency in listjoin.
  2. Implement a more efficient version of listjoin for use in the data store.
  3. Prevent duplicate DurationFloat objects being created for the same value and cache the value of __str__.

The third fix is the one which delivers the biggest boost.

Results.

branch time [1] saving
8.6.x 378.5 0%
#7313 59.31 84%
#7313 + patches 1-4 58.50 85%

[1] Cumulative time of the _main_loop method identified by cProfile.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@oliver-sanders oliver-sanders added this to the 8.6.5 milestone May 19, 2026
@oliver-sanders oliver-sanders self-assigned this May 19, 2026
@oliver-sanders oliver-sanders added the efficiency For notable efficiency improvements label May 19, 2026
Comment thread cylc/flow/parsec/util.py
for item in lst:
if item is None:
items.append(none_str)
elif any(char in str(item) for char in ',#"\''):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue of the same type that @dwsutherland spotted in #7306 (comment).

One str(item) call was being made for each character in the list, resulting in 5x the number of str calls.

Comment thread cylc/flow/parsec/util.py
Comment on lines +125 to +138
def fast_listjoin(lst, none_str=''):
"""Faster variant of listjoin suitable in select cases.

Compared to listjoin, this does *not*:
* Rationalise and pretty-format integer lists.
* Intelligently quote arguments which need quoting.

Suitable for use in situations where you know the data type of the field
being joined and the above is of no concern.
"""
return ', '.join((
none_str if item is None else str(item)
for item in lst or [None]
))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slimmed down version of listjoin above. The savings are small, however, this method gets hammered by the data store because it's called for:

  • execution polling intervals
  • execution retry delays
  • submission polling intervals
  • submission retry delays

Once for every task in the n-window.

According to cProfile, this reduces the totaltime (i.e, the time taken within the method itself, as opposed to methods invoked from the method) from 2.821s to 0.01014s (i.e, a 2.8s saving for the example).

@dwsutherland dwsutherland May 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was curious, not much difference constructing the [None] conditional list outside:

#!/usr/bin/env python


import time

N = 100
M = 100

Ns = [_n for _n in range(N)]
none_str = 'null'
NNones = Ns or [None]

def one():  # as before
    return ', '.join((
        none_str if item is None else str(item)
        for item in Ns or [None]
    ))


def two():  # None list constructed outside
    return ', '.join((
        none_str if item is None else str(item)
        for item in NNones
    ))


for method in (one, two):
    start = time.time()
    for _try in range(M):
        method()
    end = time.time()
    print(f'{method.__name__:10} {end - start}')
(flow) sutherlander@cortex-hyper:bin$ p7313.py 
one        0.0006687641143798828
two        0.0006625652313232422

N=0 also:

(flow) sutherlander@cortex-hyper:bin$ p7313.py 
one        2.9325485229492188e-05
two        2.7418136596679688e-05

Strike that (silly me), there would only be one invocation of lst or [None] anyway...

Comment on lines +663 to +669
# NOTE: This object is immutable so we cache the value of __str__.
__str__ = lru_cache(1)(_str)


# NOTE: Prevent duplicate DurationFloat objects being created for the same
# duration value. This allows __str__ caching to be effective.
DurationFloat = lru_cache(None)(_DurationFloat.__call__)

@oliver-sanders oliver-sanders May 19, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the remaining 2 million calls down to one.

This will cache the objects for the lifetime of the Python process in which they are created. This sort of caching is appropriate in circumstances where we wouldn't expect this memory to be released during runtime.

If we did want the cache to release memory, we would use weakref.

@oliver-sanders oliver-sanders marked this pull request as ready for review May 22, 2026 12:16
@oliver-sanders oliver-sanders requested a review from hjoliver May 22, 2026 12:16

@dwsutherland dwsutherland left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of impressive:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

efficiency For notable efficiency improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants