Improve Tulipa performance when running in a loop#1635
Conversation
🤖 CompareMPS report✅ MPS files match |
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
|
📝 Check the documentation preview: https://tulipaenergy.github.io/TulipaEnergyModel.jl/previews/PR1635 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1635 +/- ##
=======================================
Coverage 98.61% 98.61%
=======================================
Files 56 56
Lines 1874 1877 +3
=======================================
+ Hits 1848 1851 +3
Misses 26 26 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Extra BenchmarkTools check, minimum of 3 samples, fresh connection per sample, baseline from detached
|
|
@abelsiqueira, this PR implements improvements to speed up Tulipa's performance, especially when it's used repeatedly in a loop. I did the diagnosis and implementation with the help of CODEX. I decided to implement everything in one PR (but introduced by commit), since they all aim to achieve the same goal. I recognise that some of them are simple and might not have a major benefit. I think the most significant commits are the last two. For me, the refactor of id-1 is relevant, since using the LAG function in SQL makes sense to be faster and less time-consuming (I like that approach), and in the last commit you will see that the refactor also pre-allocates the workspace for the profiles and a refactor due to an old TO-DO (Creating inter_period profiles of inflows using the inflows profiles). All that helps to improve, as the benchmarks show. I ran the benchmark locally with the other cases to assess the benefits, as noted in the previous comment. Anyway, before merging, it would be nice if you could have a look to see if it makes sense or if you have another suggestion. Thanks! Diego |
abelsiqueira
left a comment
There was a problem hiding this comment.
Thanks @datejada, these look great, especially the profile changes. I've made non-blocking comments, so it's approved.
One thing, though, these extra benchmarks look really suspicious. Maybe they include precompilation time? Otherwise, it's over 10 times faster?!
If they are indeed correct, them an "easy" win would be to add Tiny, Storage, and Norse to benchmarks, and let the our normal benchmark script capture that. Should be more or less straightforward for the agent to create a new "level", like higher_level/Tiny/create_model. Either way, not a requirement for this PR because the improvements are clear
| LAST_VALUE(cons.id) OVER ( | ||
| PARTITION BY $id_partition_columns | ||
| ORDER BY $id_order_column | ||
| ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
There was a problem hiding this comment.
I think it's useful to explain what this does, but the full explanation is not really short.
From Claude, what I got is that this expands the window frame to all the rows, otherwise it would consider up to the current row. There is an alternative using descending ordering, but I don't think it's clearer. Also, if we assume that id is monotonically increase, we could use MAX, which would be easier.
(I feel like we do assume that, but I don't remember if we enforce it, so I'll continue as if we didn't).
Maybe something like
| ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | |
| ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- frame expanding window to all rows, to avoid assuming ordering of ids |
| cte_storage_profiles AS ( | ||
| SELECT DISTINCT | ||
| assets_profiles.profile_name, | ||
| assets_profiles.commission_year AS milestone_year |
There was a problem hiding this comment.
I don't remember our assumptions anymore, but this is one of the places (the only?) where commission_year = milestone_year. I don't remember why, but it's useful to mark this as a explicit choice for the future.
| assets_profiles.commission_year AS milestone_year | |
| assets_profiles.commission_year AS milestone_year -- here commission_year = milestone_year |
| inter_period[(row.profile_name, row.milestone_year, row.scenario)] = | ||
| convert(Vector{Float64}, row.values) |
There was a problem hiding this comment.
It's not clear to me why this doesn't need the if length(values) > 0 as it did before, but I also don't remember why we needed it, so I guess it's fine
This PR introduces several changes to improve performance in Tulipa, especially when running in a loop. Here is a summary:
run_scenarioandrun_rolling_horizon.previous_id/cycle_idinstead of per-row DuckDB lookups androw.id - 1.only(collect(...))inget_model_parameters.Changes done with the help of CODEX:
Co-authored-by: Codex <noreply@openai.com>Related issues
Closes #1634
Checklist