Do Not Merge: Integration Branch for GT4Py#3
Closed
philip-paul-mueller wants to merge 97 commits intomainfrom
Closed
Do Not Merge: Integration Branch for GT4Py#3philip-paul-mueller wants to merge 97 commits intomainfrom
philip-paul-mueller wants to merge 97 commits intomainfrom
Conversation
This requires spcl/dace-webclient#179 to be merged before being ready.
Quick fix follow-up from spcl#1706 which left a broken notebook. Co-authored-by: Roman Cattaneo <>
GitHub Actions workflows use outdated versions of - actions/checkout - actions/setup-python These actions are built for specific node versions, which are now end of life. While the workflows continue to run, GitHub issues a warning (visible in the online interface) and runs them with newer versions of node.  Since both, checkout and setup-python, are basic actions, none of the features that DaCe workflows are using changed. We might see slight speedup from out of the box caching added to setup-python in recent versions. Parent issue: GEOS-ESM/SMT-Nebulae#89 Co-authored-by: Roman Cattaneo <>
Removes websockets dependency and makes jinja2 dependency optional
- [x] Fix cutouts w.r.t. the use of UIDs, allowing them to be preserved or re-generated depending on an input parameter - [x] Fix singlestate cutout extraction when memlets access struct members.
* Various minor code generation and runtime fixes * Minor API improvements to CompiledSDFG, sdfg.view(), and subset offsetting * Minor memlet propagation fix * Various simplify pass fixes that pertain to use of views, references, and tasklets with side effects * Symbolic support for shift and ternary expressions (fixes spcl#1315) * Pass permissiveness into transformations
Example: https://github.com/spcl/dace/actions/runs/11669534375/job/32491697698 runs only a small number of tests from `tests/fortran` (24 to be precise, while after this change, it goes up to 75).
Add back clang-format. I generated with a style file, without a style file, and without clang format (uninstalled clang-format for this). SDFGs generate. I can add tests.
Co-authored-by: Alexandros Nikolaos Ziogas <alexandros.ziogas@inf.ethz.ch> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com> Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com>
Small quality of live PR that improves two error messages. Especially if the state was called `state` (as per default), it was hard to parse at first sight  For some reason, `yapf` was also changing the formatting of other code parts. I thus made two commits 1. Format the two files that I was about to edit. 2. Improve the error messages in the freshly formatted files. This allows to separate real changes from purely stylistic ones. Would it be an idea to enforce linting with a GitHub Actions workflow? I'm happy to write an issue about this (and contribute early next year). --------- Co-authored-by: Roman Cattaneo <>
…fg` branch. (spcl#1728) These commits: 1. are chronologically early 2. fairly independent and touch a small number of files 3. thematically fit together. This is a follow-up on the discussion about incrementally moving the stable features from `multi_sdfg` branch into `main`. All of the commits are cherry-picked to preserve the history and to have easy fast-forwarding of the `multi_sdfg` branch. Some commits had to be slightly modified to resolve the merge conflicts during cherry-picking, but otherwise they are left as-is. --------- Co-authored-by: Marcin Copik <mcopik@gmail.com>
…loatlit2string()` convert the FORTRAN real literal strings into python floats. (spcl#1733) Replace some of the invalid fortran test programs with valid ones (that passes under `gfortran -Wall`). This breaks some of the tests, so add fix for them too. `test_fortran_frontend_loop1()` still has an invalid test program, but the nearest valid one breaks for a different reason, and will be fixed later.
According to @phschaad, the pyFV3 workflow occasionally fails with connection refused errors when attempting to download the test data set for regression testing. This PR configures `wget` to retry even in case of fatal connection errors (like 404 (not found) or 503 (connection refused)). Let's see if this helps in the short term. In the long run, we should think about leveraging caches for that data because it rarely changes and it's kind of a wast to download it every time from scratch. Follow-up PR from spcl#1731 (comment). /cc @FlorianDeconinck FYI Co-authored-by: Roman Cattaneo <>
It was producing wrong indices for the initialization kernel, which would not work for some simple valid SDFGs (see the demo in the test).
…3.7 and 3.12 (spcl#1700) Addresses this problem: https://stackoverflow.com/a/67193166 This would be useful if we want to use this library to compare graphs in the unit tests.
…t tries to increase the number of dimensions after the reshape. (spcl#1692) Fix the issue with cpp codegen, where it currently cannot handle inputs: ```c++ cpp.reshape_strides(Range([(0, 4, 1), (0, 5, 1)]), None, None, [2, 3, 5]) ``` and crashes with an index error. Also fixes spcl#1690 where `RedundantArray` was producing a (valid) graph that triggered this.
This PR adds additional API calls and fields as requested by DaCe users. This includes: * `SDFG.auto_optimize` * `SDFG.regenerate_code` * `SDFG.as_schedule_tree`
…test program. (spcl#1736) A follow up of spcl#1733 that took a bit longer to figure out. This change breaks the test on `multi_sdfg` branch, but there the front-end is doing something additionally wrong (it somehow promotes `nclv` into a input parameter for the whole program) -- but it will even take longer to figure out why.
* Brings back 36 tests that were skipped due to prior regressions that are now fixed. * Fixes codegen generating non-atomic WCR w.r.t. neighboring edges * Contains minor modifications that were used to unskip certain tests (e.g., use of the no-longer-existent `symbol.get()` method) * Gives valid reasons for all other skipped tests
Removed some asserts that were to strict from `SDFGState._read_and_write_sets()`. If a subset is `None` the function assumes that corresponding array is fully accessed. Before, there was an `assert` that ensured that this assumption is true. However, the implementation also assumed that this could be verified, either because the same symbols were used or the size was known. But this if not necessarily the case, if the two involved arrays uses different symbols, that however, are always the same. The other reason was that before the size was estimated using `total_size`, which has a well defined meaning for an `Array` but in case of a `View` its meaning is not fully clear. For these reasons the asserts was removed. --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
…an values in boolean comparisons (spcl#1756) Strings like `not ((N > 20) != 0)` (== `Not(Ne(Gt(N, 20), 0))`) were incorrectly interpreted by `sympy.sympify` as constant "False". This is a limitation by sympy, which does not assume integer 0 to be a Falsy, and enforces exact equivalence (or difference) checks with `Ne`. To get around this limitation, the DaCe internal AST preprocessor now replaces constants with boolean values if they are arguments to Comparison operations, where the other operand is also a comparison operation, thus returning a boolean. This fixes an issue with `DeadStateElimination`, closing issue spcl#1129.
Both $\pi$ and `NaN` are implemented as classes in DaCe. However, these classes where not marked as device, thus they are only available on the host. This PR: - Fixes this for `NaN` and $\pi$. - Extend the implementation. - Adds tests for them.
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
…rmations (spcl#1636) I made some minor additions to make implementing some transformations easier for me. I will explain all three changes and why I needed them. 1. Add gpu_force_syncthreads to force a call to __syncthreads in a map in dace/codegen/targets/cuda.py and dace/sdfg/nodes.py. - I preferred to tile work maps (e.g., K reduction for sum-of-inner-products matrix multiplication) of kernels in such a way that all new tiled maps are in the scope of the thread block map, yet when it is combined with shared memory, a `__syncthreads` call is necessary within the thread block map which is not performed for sequential maps inside a thread block scheduled map, I would like to be able to force this behavior 2. Adding the skew option to the map tiling transformation. - Having every map start from 0 makes writing my transformations simpler. Therefore, I wanted the map tiling transformation to start the inner map at 0; I could only achieve this behavior by copying over the skew parameter from the strip mine transformation. I would still prefer to use the map tiling transformation instead of strip mine while having the skew parameter.
…al execution (spcl#1930) This PR addresses spcl#1928 by fixing some code in `DeadStateElimination` (DSE) pass. This code was supposed to handle exactly the case reported in the issue, but the check for single branch with unconditional execution was not entirely correct. Test case added. Includes a change in fpga tests to address a CI error not related to this change: https://github.com/spcl/dace/actions/runs/13242670532/job/36961472662?pr=1930
This fixes a bug in cutout for me.
Just fixes a typo found in a recent debugging session. @phschaad would you mind hitting the merge button? Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
This PR introduces a new and improved version of `MapFusion`. A summary of the changes can also be found [here](https://github.com/user-attachments/files/18516299/new_map_fusion_summary_of_changes.pdf), it compares the resulting SDFGs generated by the old and new transformation of some unit tests. #### Fixed Bugs and removed Limitations - The subsets (not the `.subset` member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See the `test_offset_correction_range_read()`, `test_offset_correction_scalar_read()` and the `test_offset_correction_empty()` tests. - Upon the propagation of the subsets, due to the changed intermediate, was not handled properly. Essentially, the transformation only updated `.subset` and ignored `.other_subset`. Which is correct in most cases but not always. See the `test_fusion_intrinsic_memlet_direction()` for more. - During the check if two maps could be fused the `.dynamic` property of the Memelts were fully ignored leading to wrong code. - The read-write conflict checks were refined, before all arrays needed to be accessed the wrong way, i.e. before a fusion was rejected when one map accessed `A[i, j]` and the other map was accessing `B[i + 1, j]`. Now this is possible as long as every access is point wise. See the `test_fusion_different_global_accesses()` test for an example. - The shape of the reduced intermediate is cleaned, i.e. unnecessary dimensions of size 1, are removed, except they were present in the original shape. To make an example, the intermediate array, `T`, had shape `(10, 1, 20)` and inside the map was accessed `T[__i, 0, __j]`, then the old transformation would have created an reduced intermediate of shape `(1, 1, 1)`, new its shape is `(1)`. Note if the intermediate has shape `(10, 20)` instead and would be accessed as `T[__i, __j]` then a `Scalar` would have been created. See also the `struct_dataflow` flag below. #### New Flags - `only_toplevel_maps`: If `True` the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged. - `only_inner_maps`: If `True` then the transformation will only fuse maps that are inside other maps. - assume_always_shared`: If `True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG. - `strict_dataflow`: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations. #### `FullMapFusion` This PR also introduced the `FullMapFusion` pass, which makes use of the `FindSingleUseData` pass that was introduced in [PR#1906](spcl#1906). The `FullMapFusion` applies MapFusion as long as possible, i.e. fuses all maps that can be fused. But instead of scanning the SDFG every time an intermediate node has to be classified, i.e. can it be deleted or not, it is done once and then reused which will speed up fusion process as it will remove the need to traverse the full SDFG many times. This new pass also replaced the direct application of MapFusion in `auto_optimizer`. #### References Collection of known issues in other transformation: - [RedundantArrayRemoval (auto optimize)](spcl#1644) - [Bug in `RefineNestedAccess` and `SDFGState._read_and_write_sets()](spcl#1643) - [Error in Composite Fusion (auto optimize)](spcl#1642) --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
The tutorial `codegen.ipynb` has a small bug, namely the `ControlFlowRegion` is not in the correct namespace (as used in the tutorial). This PR fixes the issue by importing `ControlFlowRegion` in the `dace` namespace. Additionally I added a test for the notebook tutorials to make sure they are not broken. At the moment I added only the "getting started" and "codegen" notebooks. These tests require new dependencies `ipykernel` and `nbconvert`. The above may or may not be the preferred solutions! --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Bring a V1 maintenace fix PR'ed here: spcl#1954 Original commit messages: - Check for all potential matches in `_check_paths` to not miss potential `memlets_intersect` failure. - Unit test checks the internal `_check_paths` function since it triggers on a non deterministic networkx search.
## Description As discussed in the weekly DaCe meeting today, we'd like to go ahead with enforcing consistent formatting as part of CI. As a reminder, this started with a discussion[^1] in December and there was no pushback on the discussion page and also not in the meeting today. We decided to proceed in the following way. 1. @tbennun will run `pre-commit` / `yapf` locally on the codebase and push that, see spcl#1966 2. This PR wil then be merged afterwards and just contain the changes needed to add the new CI job. <!-- Tal, you might wanna a couple manual guards in places like https://github.com/spcl/dace/blob/6a485cf310940d3299ca48bcc4357d207ae67d04/tests/codegen/codegen_used_symbols_test.py#L17-L30 and other outlined below to keep beautiful manual formatting that makes sense in some (very limited) cases. --> [^1]: spcl#1804 --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Cherry picking fix to the data flow elimination path from V1/maintenance. Original PR: spcl#1955 --------- Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
This commit partially addresses [issue#1967](spcl#1967), in the sense that it turns GPU kernels into a no-ops (i.e. does not start it at all) if a dimension of the iteration space has zero or negative size. The code generator contained such a check, which was called "empty grid". Thus in a sense this commit only clarifies the notion of what an empty grid is. Before this was only detected if the size was dependent on runtime values, but not if the size was known at compilation time. As explained in the issue, this leads to strange behaviour, for example the map range `{"__i": "a:b", "__j": "10:0"}` the code generator will add checks for `__i` but not for `__j`. This was inconsistent with CPU backend.
…cl#1979) This PR properly implements the `Pass.set_opts()` mechanism. The format is such that the key must be prefixed with the name of the transformation for which the key should be set. According to Philipp Schaad this was the intended behaviour, but it was not implemented properly. Now the function is implemented once in the `Pass` and sub classes does not have to provide their own implementation.
The error was that the state name was only added to the list of known names if it was already known. Assume that the top SDFG had the states with name `name1`, but the nested SDFG had the states `name1` and `name1_0` (the `_0` is the pattern added by the `find_new_name()` function). The error happens if `name1_0` is processed before, because it is not known it was not added to `statements`, then `name1` is processed, because it is known, the parent has a state of the same name, it is passed to `find_name_name()`. That function will then see that `name1_0` is free and use that name. This leads to the error because now two states with the name `name1_0` are present in the SDFG, once the original state (from the nested SDFG) and one the state (from the nested SDFG) that was previously named `name1`.
This PR addresses [issue#1959](spcl#1959), the issue was about that the inlining transformation destroys the SSA invariant we maintain inside GT4Py. This invariant states that in every code path there is exactly one AccessNode with that is used to write to a data container. The PR solves this issue by modifying how the nested SDFG is isolated, before this was done in two steps and is now performed in a single step. The main idea is that the number of writes is preserved.
spcl#1984) Here is the log of the issue when trying to compile a `DaCe` `gt4py` program with latest `Apple Clang`: ``` > raise cgx.CompilationError('Compiler failure:\n' + ex.output) E dace.codegen.exceptions.CompilationError: Compiler failure: E [ 25%] Building CXX object CMakeFiles/temporary_fields_for_turbulence_diagnostics.dir/var/folders/5x/5gbpjy215c1_x31rmwc75t7m0000gp/T/gt4py_session_j2sksukj/temporary_fields_for_turbulence_diagnostics_b33111c9ad98f7f80e665d9dd613c7639b719e0e6e7b933278775078ee836dd0/src/cpu/temporary_fields_for_turbulence_diagnostics.cpp.o E In file included from /var/folders/5x/5gbpjy215c1_x31rmwc75t7m0000gp/T/gt4py_session_j2sksukj/temporary_fields_for_turbulence_diagnostics_b33111c9ad98f7f80e665d9dd613c7639b719e0e6e7b933278775078ee836dd0/src/cpu/temporary_fields_for_turbulence_diagnostics.cpp:2: E In file included from /Users/ioannmag/cscs_repos/cycle28/icon4py/.venv/lib/python3.10/site-packages/dace/codegen/../runtime/include/dace/dace.h:20: E /Users/ioannmag/cscs_repos/cycle28/icon4py/.venv/lib/python3.10/site-packages/dace/codegen/../runtime/include/dace/reduction.h:535:44: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] E 535 | return wcr_custom<T>::template reduce_atomic( E | ^ E 1 error generated. E make[2]: *** [CMakeFiles/temporary_fields_for_turbulence_diagnostics.dir/var/folders/5x/5gbpjy215c1_x31rmwc75t7m0000gp/T/gt4py_session_j2sksukj/temporary_fields_for_turbulence_diagnostics_b33111c9ad98f7f80e665d9dd613c7639b719e0e6e7b933278775078ee836dd0/src/cpu/temporary_fields_for_turbulence_diagnostics.cpp.o] Error 1 E make[1]: *** [CMakeFiles/temporary_fields_for_turbulence_diagnostics.dir/all] Error 2 E make: *** [all] Error 2 ```
This code change addresses the following error:
```
callsite_stream.write(
> '''__state->report.save("{path}/perf", __HASH_{name});'''.format(path=sdfg.build_folder.replace(
'\\', '/'),
name=sdfg.name), sdfg)
E TypeError: Path.replace() takes 2 positional arguments but 3 were given
```
commit 635da6c Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri Apr 25 15:40:25 2025 +0200 Add NVTX range in CUDA GPU kernel call of program
commit aef9945 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Tue Mar 25 06:52:35 2025 +0100 As an experiment removed some code I think is useless, let's see what the tests say. commit e5bf87f Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Mar 24 07:05:18 2025 +0100 Added a comment to address the possible issues with viewes. commit ba97874 Merge: b0b9945 4245396 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 21 16:07:12 2025 +0100 Merge remote-tracking branch 'spcl/main' into improved-2d-copy commit b0b9945 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 21 16:06:09 2025 +0100 Added Alexnicks's suggestions. commit 065e0d7 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Mar 17 08:56:39 2025 +0100 Added tests to ensure that the new verification works as expected. commit 51182e5 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Mar 17 08:14:14 2025 +0100 Moved the test for negative sized subsets from the Memlet to the `vaidate_state()` function. The reason is that in some cases this is valid, for example if an edge connects an AccessNode and a MapEntry, because, in that case the map might not be executed. Since the Memlet does not have access to its source and destination node it can not check that, so the test was moved to a location that can do this check. However, it only does the check for AN to AN connections, which is a bit restrictive, but this is something for later. commit 2801967 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Mar 17 07:41:51 2025 +0100 I am not sure why the printout of the edge is not correct, but it is not where I though I found it. commit 3166302 Author: Philip Mueller <philip.paul.mueller@bluemain.ch> Date: Sat Mar 15 08:23:21 2025 +0100 Fixed some issue and made it more logical. commit 02d87b5 Merge: 801adb1 d130792 Author: Philip Mueller <philip.paul.mueller@bluemain.ch> Date: Sat Mar 15 08:18:08 2025 +0100 Merge remote-tracking branch 'spcl/main' into improved-2d-copy commit 801adb1 Author: Philip Mueller <philip.paul.mueller@bluemain.ch> Date: Sat Mar 15 08:15:45 2025 +0100 Added more verification. commit 66b43f8 Author: Philip Mueller <philip.paul.mueller@bluemain.ch> Date: Sat Mar 15 07:57:03 2025 +0100 Simplified some check. commit 76a1a58 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 15:20:49 2025 +0100 Added a new test for the pseudo 1d case, i.e. when we reduce a copy 2D copy to a 1d copy, because it happens to be continiously allocated. commit 0b15a74 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 14:59:03 2025 +0100 Added a note about wrong usage of eid in validation. commit 322ecda Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 14:52:28 2025 +0100 Improved memlet checking. commit 61ea7a6 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 14:43:27 2025 +0100 Added a new test to the SDFG. commit a67ad2a Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 14:36:02 2025 +0100 Added now also test for testing strided 1d copy. commit c931b91 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 14:07:16 2025 +0100 Now 2D copies works, more tests needed. commit d0a396f Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 13:42:18 2025 +0100 Updated the memlet copying, I think I now have all the cases will now make the tests. commit 9b49c9e Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 11:16:12 2025 +0100 Made a first version of the new copy implementation. But I have to run the unit tests. commit feea97f Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Mar 14 10:18:08 2025 +0100 Started with the implementation of a better copy, but I have to fix it more.
commit 21dbe91 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Apr 30 07:55:25 2025 +0200 Updated the validation regarding `other_subset`. If the validation founds an out of bound access for in the subset of a Memlet that is marked as `dynamic` it will only output a warning. If it however, finds this oob behaviour in `other_subset` it will generate an hard error. This commit changes this behaviour such that also for `other_subset` only a warning is generated, makes them more uniform.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Old integration branch, was a bit strange.