[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for#496
Conversation
4c048e8 to
d8e05eb
Compare
There was a problem hiding this comment.
The fix is logical and well-explained, but this touches correctness-critical autodiff alloca placement logic with subtle SSA dominator reasoning — worth a human look before landing.
Extended reasoning...
Overview
The PR modifies a single function (MakeAdjoint::adjoint) in auto_diff.cpp, changing how adjoint AllocaStmt nodes are placed when a GlobalLoadStmt lives outside the currently-processed range-for iteration body.
The Change
The old code checked whether the GlobalLoadStmt's direct parent block was owned by a RangeForStmt before deciding placement. This missed the case where the GlobalLoad was inside an if-body nested within the for-loop — in that case the old code fell through and placed the alloca inside the if-branch, which visit(IfStmt) cannot see from the sibling reverse IfStmt it emits, causing DCE to strip it and producing silently-zero gradients.
The new code walks up the block hierarchy from the GlobalLoad's parent until it reaches a block owned by a loop/offloaded/kernel statement (any unconditional dominator), skipping if/while bodies. This is the correct approach.
Security Risks
None. This is a compiler IR transformation pass with no surface area for injection, auth bypass, or data exposure.
Level of Scrutiny
High. Autodiff correctness is notoriously subtle — a placement bug like this produces wrong numerical results rather than crashes, making it hard to detect in testing. The fix is small but the reasoning touches SSA dominance, which is easy to get subtly wrong. The test plan checkbox is unchecked, so it is unclear whether test_ad_if_nested was confirmed to pass.
Other Factors
The pre-existing null-pointer dereference flagged in visit(GlobalStoreStmt) is unrelated to this PR. The new fallback if (target == nullptr) { target = alloca_block; } is a safe catch-all. The omission of WhileStmt from the break condition is benign since WhileStmt is rejected earlier in the pipeline.
d8e05eb to
b9d107a
Compare
0f7ff3b to
56c457e
Compare
b9d107a to
0c487e2
Compare
56c457e to
8fd18b0
Compare
0c487e2 to
7f9c702
Compare
8fd18b0 to
d813066
Compare
7f9c702 to
a5f664f
Compare
|
Opus summary: SummaryFixes the placement of adjoint alloca statements for Changes
Strengths
Weaknesses / things to consider
|
|
Please address these opus points:
I'm fine with not handling mesh for. I wonder if we should remove mesh for? (I'm not susggesting removing mesh for in this pr, nor am I suggesting that you create a PR to remove mesh for; but I am pondering whether we should remove it) |
|
|
||
| @test_utils.test() | ||
| def test_ad_nested_if_inside_for_loop(): | ||
| # Regression test for the adjoint-alloca placement fix. The forward kernel reads `w[i]` |
There was a problem hiding this comment.
Lets have a more plain-english explanation too pelase, ie that is understandable using python concepts.
There was a problem hiding this comment.
Plain english version. Place before or after the detailed version please:
# Regression test for adjoint-alloca placement when a field read (`w[i]`) appears
# inside nested `if` blocks within a for-loop being differentiated. Before the fix,
# the gradient accumulator for `w[i]` was placed inside the forward `if` body, but
# the reverse pass generates its backward code in a separate, parallel `if` block
# that can't see variables defined in the forward one. The accumulator was silently
# eliminated as dead code, and `w.grad[i]` came out as zero instead of the correct
# `2 * w[i]`.
d813066 to
ce8e91b
Compare
a5f664f to
d9d2d09
Compare
|
@claude review |
ce8e91b to
7d5ce3c
Compare
d9d2d09 to
da7fb7a
Compare
There was a problem hiding this comment.
LGTM — both prior review comments addressed.
Extended reasoning...
Overview
The PR makes a targeted fix to MakeAdjoint::adjoint() in quadrants/transforms/auto_diff.cpp: the old single-level parent-is-RangeForStmt check is replaced by a walk-up loop that stops at any dominating block (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root). A new regression test and an xfail while-loop variant are added to tests/python/test_ad_if.py.
Security risks
None. This is a compiler IR transformation with no user-facing data handling, auth, or I/O surface.
Level of scrutiny
The core logic change is small and well-contained. The walk-up terminates unconditionally (nullptr fallback), the stop-set covers all dominating statement kinds, and the new test exercises both the passing (0 < v < 10) and failing conditions with exact-comparison assertions. Previous review cycles caught and resolved a wrong test-name reference and a missing adstack guard; both are now fixed.
Other factors
No bugs found by the automated bug-hunting pass. All inline comments from prior review rounds are marked resolved. Human reviewer (hughperkins) requested improvements that have been incorporated.
7d5ce3c to
94e5650
Compare
da7fb7a to
1bb6c33
Compare
There was a problem hiding this comment.
LGTM — both previously raised issues are resolved, no new bugs found.
Extended reasoning...
Overview
The PR fixes a correctness bug in in : when a sits inside a nested (or ) body within the range-for being differentiated, the old single-level parent check left the adjoint alloca inside the forward if-body, which is SSA-invalid from the reverse sibling IfStmt and gets DCE'd to silently-zero gradients. The fix replaces the check with an upward walk that stops at a dominating loop/offload/kernel body. Two new tests are added to : a passing regression test for the range-for case and an xfail placeholder for the while-loop variant (reverse-mode AD does not yet implement ).
Security risks
None. This is a pure compiler-IR transformation pass with no I/O, authentication, or data-exposure surface.
Level of scrutiny
Moderate. The change touches a critical autodiff correctness path, but the logic is well-contained and the fix is a straightforward upward-walk replacement of a single-level parent check. The regression test directly reproduces the bug and verifies both the fixed case and the boundary (untaken-branch gradients are zero). The previously raised concerns — wrong test reference in the comment and missing guard — have both been addressed in the latest revision.
Other factors
No bugs were found by the bug hunting system in the current revision. All prior inline comments are resolved. The test uses exact comparisons (no floating-point tolerance needed for this integer-valued computation), includes boundary inputs that deliberately fail the conditions, and is guarded with matching every other reverse-mode AD test in the file.
94e5650 to
421d30e
Compare
1bb6c33 to
2bfdfc5
Compare
There was a problem hiding this comment.
LGTM — both prior review items addressed; the walk-up alloca placement fix is correct and well-tested.
Extended reasoning...
Overview
The PR modifies two files: the core auto_diff.cpp alloca-placement logic in MakeAdjoint::adjoint(), and tests/python/test_ad_if.py with two new tests. The change replaces a single-level parent check with a walk-up loop that stops at SSA-dominating scopes (loop/offload bodies), fixing silently-zero gradients when a GlobalLoadStmt lives inside nested if/while blocks within a differentiated for-loop.
Security risks
None. This is purely an internal IR transformation pass with no external-facing surface.
Level of scrutiny
Moderate. The change is surgical — only the alloca insertion target changes — but it touches a correctness-critical path in the reverse-mode autodiff pipeline. The logic is sound: walking up until reaching a node that unconditionally dominates both forward and reverse code is the correct fix for the SSA-dominance problem described. The stop-set (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, nullptr) covers all unconditionally-dominating loop/kernel constructs.
Other factors
My previous review flagged two issues: (1) wrong test reference in the comment (test_ad_if_nested → test_ad_nested_if_inside_for_loop), and (2) missing require=qd.extension.adstack on the new test. Both are fixed in the current revision. The test now covers boundary inputs that fail one or both conditions (expecting gradient 0), which validates no spurious accumulation from untaken branches. The xfail while-loop companion test is a nice forward-looking placeholder.
|
Opus descrition update: SummaryFixes a silent reverse-mode AD bug where the gradient accumulator alloca for a What's in the PRNet diff vs
|
|
Please address:
|
There was a problem hiding this comment.
LGTM — all prior feedback addressed, fix is correct and well-tested.
Extended reasoning...
Overview
The PR modifies two files: quadrants/transforms/auto_diff.cpp (the core alloca-placement fix in MakeAdjoint::adjoint()) and tests/python/test_ad_if.py (two new regression tests). The change replaces a single-level parent-check with an upward walk that stops at a block whose owning statement unconditionally dominates both forward and reverse code (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root), skipping if/while bodies that do not dominate the reverse-pass sibling IfStmt emitted by visit(IfStmt).
Security risks
None. This is an autodiff IR transform correctness fix with no security-sensitive surface (no auth, crypto, permissions, or external data boundaries).
Level of scrutiny
Moderate. The placement logic touches the core reverse-mode AD pass, but the change is surgical and well-reasoned: the walk terminates because it either hits a dominating statement or reaches nullptr, and the fallback to alloca_block when the walk exhausts the tree is safe. The regression test directly exercises the previously-broken case and includes elements that must produce zero gradient (values -2.0 and 20.0 outside the condition range), ruling out spurious backward accumulation.
Other factors
All feedback from my prior review was addressed: the wrong test reference (test_ad_if_nested) was corrected to test_ad_nested_if_inside_for_loop, and the missing require=qd.extension.adstack guard was added to match every other reverse-mode AD test in the file. The reviewer's requests (while-loop xfail test, boundary inputs, exact equality checks, plain-English comment) were all satisfied. No new bugs were flagged by the bug-hunting system.
c27ceb3 to
121da95
Compare
1c90048 to
6e48d59
Compare
121da95 to
783f8a3
Compare
6e48d59 to
4991012
Compare
783f8a3 to
19251b8
Compare
4991012 to
0900c72
Compare
19251b8 to
6b46265
Compare
0900c72 to
82ad225
Compare
6b46265 to
0df2b1d
Compare
82ad225 to
77f3a07
Compare
0df2b1d to
ece6b34
Compare
77f3a07 to
46d1da3
Compare
There was a problem hiding this comment.
LGTM — all prior review comments addressed, walk-up fix and test coverage look correct.
Extended reasoning...
Overview
Two-file diff: a focused fix to MakeAdjoint::adjoint() in auto_diff.cpp and five new tests in tests/python/test_ad_if.py. The C++ change replaces a single-level parent->parent_stmt()->is<RangeForStmt>() check with a walk-up loop that stops at any unconditionally-dominating block owner (loop/offloaded/mesh-for/kernel root), ensuring the adjoint alloca is visible from both the forward and reverse SSA scopes regardless of nesting depth.
Security risks
None. This is a compiler IR transform for automatic differentiation with no security-sensitive surface area.
Level of scrutiny
Moderate — this touches the core reverse-mode AD transform which is non-trivial, but the change is surgical and well-bounded. The SSA-dominance invariant is clearly explained in the new comment, the stop-set is enumerated, and the QD_ASSERT(target \!= nullptr) guard surfaces malformed IR loudly rather than silently degrading. Four new tests cover f32/f64 variants, if/elif/else nesting, and nested-for shapes; an xfail test documents the known while-loop gap.
Other factors
All three rounds of review comments (comment reference fix, missing require=qd.extension.adstack, and the two April 21 inline comments about inaccurate IR descriptions in test_ad_nested_for_loops_global_load and test_ad_nested_if_elif_else_inside_for_loop) have been addressed in the current code. The bug hunting system found no bugs. The PR is part of a larger autodiff stack but the changes are self-contained.
46d1da3 to
ce478a5
Compare
There was a problem hiding this comment.
LGTM — all previous review points addressed, fix is correct.
Extended reasoning...
Overview
Two-file change: (alloca placement logic in ) and (four new regression tests plus one xfail). Fixes silently-zero gradients for field reads nested inside blocks within a differentiated for-loop.
Security risks
None. This is a compiler IR transform pass; no auth, crypto, or data exposure surface.
Level of scrutiny
Medium. The change is a focused bug fix in an autodiff pass — not trivial (IR manipulation, SSA-dominance reasoning), but the logic is well-explained and the stop-set is correct. The four new tests pin all the shapes the fix covers, and the replaces a silent-degradation fallback with a loud failure for malformed IR. All three previous review rounds had their feedback fully addressed.
Other factors
All inline comments from my two prior review runs are marked resolved and confirmed fixed in the diff: correct test reference, guard, corrected IR-structure description in the elif test comment, and accurate documentation that exercises Case 2 rather than the walk-up. No outstanding reviewer comments remain.
Fix adjoint-alloca placement for
GlobalLoads outside the current range-forTL;DR
Before:
The check fired only when the
GlobalLoad's immediate parent was aRangeForStmt. When the load lived insideif { if { w[i] } }inside a for-loop, the check failed and the alloca got placed inside the forwardif-body's block.MakeAdjoint::visit(IfStmt)emits the reverse code into a brand-new siblingIfStmt(not back into the forward body), so the alloca was SSA-invalid from the reverse branch's point of view and got DCE'd. Result:w.grad[i]came out silently zero instead of the correct2 * w[i].After:
Stop set:
RangeForStmt/StructForStmt/OffloadedStmt/MeshForStmt, or the kernel root (parent_stmt == nullptr).IfStmt/WhileStmtare explicitly not in the stop set — those emit sibling reverse blocks, which is the whole point of walking past them.Why the walk-up is the right shape
MakeAdjoint::visit(IfStmt)builds a freshIfStmtin the reverse flow and visits the forward body's statements withcurrent_blockrebound to the new reverse branch. Statements in the forward branch are not live in the reverse branch's SSA scope — only values that dominate both branches (i.e. defined before theIfStmt) are. For the adjoint alloca to be readable from both the forward pass (which writesadj(w[i]) += ...) and the reverse pass (which accumulates into it), it must live in a block that both can see. That's exactly a block whose owner unconditionally dominates the enclosed code.Loops (
RangeForStmt/StructForStmt/OffloadedStmt/MeshForStmt) qualify because their reverse emission wraps the reversed body in a sibling loop that keeps the body's control flow intact — the accumulator needs to live just outside that wrapper, not inside it, which is what the walk-up produces.The
QD_ASSERT(target != nullptr)is a defensive guard: reachingnullptrmeans the primal's enclosing-block chain is broken (an unparented block), which should never happen on well-formed IR. The previous fallback of "tryalloca_block" silently restored the pre-fix buggy placement whenever the chain was malformed; asserting surfaces it loudly.Changes
quadrants/transforms/auto_diff.cppMakeAdjoint::visit(GlobalLoadStmt)(around the adjoint alloca placement). The condition for "is this load outside the currently-processed range-for?" moves fromparent->parent_stmt()->is<RangeForStmt>()toforward_backup->locate(stmt) == -1, and the placement logic walks up as described above. Comment spells out the SSA-dominance invariant, enumerates the dominating stop set, and explicitly excludesIfStmtwith a one-line reason.tests/python/test_ad_if.pyFour new tests, all gated on
qd.extension.adstack:test_ad_nested_if_inside_for_loop(f32) — the regression pin. Nestedif / ifinside afor i in w:with inputs that deliberately straddle both conditions: some elements fail the outer guard, some fail the inner guard, some pass both.w.grad[i]must be2*w[i]only for the passes-both elements and exactly0otherwise (catches spurious accumulation from untaken branches too).test_ad_nested_if_inside_for_loop_f64— same shape with f64 field anddefault_fp=qd.f64.test_ad_nested_if_elif_else_inside_for_loop—if / elif / elif / elseinside afor i in w:, four branches with different weights. Pins the walk-up for IR shapes where the nested construct is an if-chain rather than two stackedifs.test_ad_nested_for_loops_global_load— thefor i in x: a = x[i]; for _ in range(n_inner): y += ashape. TheGlobalLoadStmtlives in the outerfor's body, and the alloca must be hoisted above the innerfor(not placed inside it). Without the walk-up the inner reversed body runs the accumulatorn_innertimes per outer iteration, producinggrad = n_inner²instead of the correctn_inner.A fifth test (
test_ad_nested_if_inside_while_loop) is included asxfail(strict=True, raises=RuntimeError)— reverse-mode AD does not yet have avisit(WhileStmt)implementation, but the IR shape (while wrapping nested ifs wrapping a field read) is the one the walk-up fix needs to hold on whenvisit(WhileStmt)ever lands.Side-effect audit
if-in-for gradientsQD_ASSERT(target != nullptr)surfaces it loudly instead of silently degrading.QD_NOT_IMPLEMENTEDinvisit(WhileStmt)(covered by the xfail test).Stack
Autodiff 5 of 13. Based on #503 (rsqrt). Followed by #491 (regression tests).