data_store: optimise rtconfig serialisation#7313
Conversation
| for item in lst: | ||
| if item is None: | ||
| items.append(none_str) | ||
| elif any(char in str(item) for char in ',#"\''): |
There was a problem hiding this comment.
This is an issue of the same type that @dwsutherland spotted in #7306 (comment).
One str(item) call was being made for each character in the list, resulting in 5x the number of str calls.
| def fast_listjoin(lst, none_str=''): | ||
| """Faster variant of listjoin suitable in select cases. | ||
|
|
||
| Compared to listjoin, this does *not*: | ||
| * Rationalise and pretty-format integer lists. | ||
| * Intelligently quote arguments which need quoting. | ||
|
|
||
| Suitable for use in situations where you know the data type of the field | ||
| being joined and the above is of no concern. | ||
| """ | ||
| return ', '.join(( | ||
| none_str if item is None else str(item) | ||
| for item in lst or [None] | ||
| )) |
There was a problem hiding this comment.
This is a slimmed down version of listjoin above. The savings are small, however, this method gets hammered by the data store because it's called for:
- execution polling intervals
- execution retry delays
- submission polling intervals
- submission retry delays
Once for every task in the n-window.
According to cProfile, this reduces the totaltime (i.e, the time taken within the method itself, as opposed to methods invoked from the method) from 2.821s to 0.01014s (i.e, a 2.8s saving for the example).
There was a problem hiding this comment.
Was curious, not much difference constructing the [None] conditional list outside:
#!/usr/bin/env python
import time
N = 100
M = 100
Ns = [_n for _n in range(N)]
none_str = 'null'
NNones = Ns or [None]
def one(): # as before
return ', '.join((
none_str if item is None else str(item)
for item in Ns or [None]
))
def two(): # None list constructed outside
return ', '.join((
none_str if item is None else str(item)
for item in NNones
))
for method in (one, two):
start = time.time()
for _try in range(M):
method()
end = time.time()
print(f'{method.__name__:10} {end - start}')
(flow) sutherlander@cortex-hyper:bin$ p7313.py
one 0.0006687641143798828
two 0.0006625652313232422
N=0 also:
(flow) sutherlander@cortex-hyper:bin$ p7313.py
one 2.9325485229492188e-05
two 2.7418136596679688e-05
Strike that (silly me), there would only be one invocation of lst or [None] anyway...
2aa99d6 to
13f6f05
Compare
| # NOTE: This object is immutable so we cache the value of __str__. | ||
| __str__ = lru_cache(1)(_str) | ||
|
|
||
|
|
||
| # NOTE: Prevent duplicate DurationFloat objects being created for the same | ||
| # duration value. This allows __str__ caching to be effective. | ||
| DurationFloat = lru_cache(None)(_DurationFloat.__call__) |
There was a problem hiding this comment.
This reduces the remaining 2 million calls down to one.
This will cache the objects for the lifetime of the Python process in which they are created. This sort of caching is appropriate in circumstances where we wouldn't expect this memory to be released during runtime.
If we did want the cache to release memory, we would use weakref.
13f6f05 to
fa74bb9
Compare

Partially addresses the example which demonstrated the severe performance issues outlined in #7267
Improve the efficiency of some parsec methods to improve the performance of datastore runtime configuration serialisation.
(deliberating over 8.6.x vs master for this one)
Example
This example is derived from the same workflow which prompted #7267 which includes complex execution/submission retry delay configurations.
This example stresses different factors to the one outlined there and does not use any absolute dependencies.
Problem
When the data store creates each
<foo>task proxy, it must serialise the runtime configuration. This results in thePT1Sexpression being evaluated 10 million times!PT1Sexpressions.Fixes
There are three optimisations here:
listjoin.listjoinfor use in the data store.DurationFloatobjects being created for the same value and cache the value of__str__.The third fix is the one which delivers the biggest boost.
Results.
[1] Cumulative time of the
_main_loopmethod identified by cProfile.Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.