Hi PyGenStability team,
Firstly - thank you for the amazing package! It's very user-friendly and also produces very exciting results.
I'm raising this issue after informal discussions with Dominik. For my dataset, running PyGenStability across a reasonable scale-range with lots of repeated runs at each scale takes a very long time. I'm implementing this on a computing cluster which has walltime limits that I keep hitting. Increasing the number of workers only helps so much. The way the code is written currently makes it quite hard to chunk up a job into subjobs that can be run independtly.
I was wondering if there is a way we could refactor some of the code to enable us to "chunk" up a run based on the scales. So for example, if I wanted to run 200 scales across scale-range -2 to 2, I could instead chunk up the job to 4 subjobs, each with 50 scales ranging from -2 to -1, -1 to 0, 0 to 1 and 1 to 2. The remaining steps of calculating NVI(t, t') and finding optimal scales could then be run separately after an method is implemented of combining results files.
This additional functionality would also help in the case that a user wanted to test more scales after an initial run, without rerunning the entire method over the full set of scales.
I'm not a great Python coder, so not sure I would be the best person to write the final fix. But would be happy to contribute.
Look forward to discussing this more with you.
All the best
Sam
Hi PyGenStability team,
Firstly - thank you for the amazing package! It's very user-friendly and also produces very exciting results.
I'm raising this issue after informal discussions with Dominik. For my dataset, running PyGenStability across a reasonable scale-range with lots of repeated runs at each scale takes a very long time. I'm implementing this on a computing cluster which has walltime limits that I keep hitting. Increasing the number of workers only helps so much. The way the code is written currently makes it quite hard to chunk up a job into subjobs that can be run independtly.
I was wondering if there is a way we could refactor some of the code to enable us to "chunk" up a run based on the scales. So for example, if I wanted to run 200 scales across scale-range -2 to 2, I could instead chunk up the job to 4 subjobs, each with 50 scales ranging from -2 to -1, -1 to 0, 0 to 1 and 1 to 2. The remaining steps of calculating NVI(t, t') and finding optimal scales could then be run separately after an method is implemented of combining results files.
This additional functionality would also help in the case that a user wanted to test more scales after an initial run, without rerunning the entire method over the full set of scales.
I'm not a great Python coder, so not sure I would be the best person to write the final fix. But would be happy to contribute.
Look forward to discussing this more with you.
All the best
Sam