[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for by duburcqa · Pull Request #496 · Genesis-Embodied-AI/quadrants

duburcqa · 2026-04-17T11:37:47Z

Fix adjoint-alloca placement for `GlobalLoad`s outside the current range-for

Silently-zero gradients on any reverse-mode kernel that reads a field (w[i]) inside a nested control-flow construct (another if, another for, …) inside a for-loop. Fix walks up the enclosing-block chain until reaching a block whose owning statement unconditionally dominates both forward and reverse code.

TL;DR

Before:

if (stmt->is<GlobalLoadStmt>() && stmt->parent->parent_stmt() != nullptr &&
    stmt->parent->parent_stmt()->is<RangeForStmt>()) {
  // ...single-level check: only covered a load whose IMMEDIATE parent block
  // was the range-for's body.
}

The check fired only when the GlobalLoad's immediate parent was a RangeForStmt. When the load lived inside if { if { w[i] } } inside a for-loop, the check failed and the alloca got placed inside the forward if-body's block. MakeAdjoint::visit(IfStmt) emits the reverse code into a brand-new sibling IfStmt (not back into the forward body), so the alloca was SSA-invalid from the reverse branch's point of view and got DCE'd. Result: w.grad[i] came out silently zero instead of the correct 2 * w[i].

After:

if (stmt->is<GlobalLoadStmt>() && forward_backup->locate(stmt->as<GlobalLoadStmt>()) == -1) {
  // Walk up the enclosing-block chain until we find a block whose owning statement
  // unconditionally dominates both forward and reverse code.
  Block *target = stmt->as<GlobalLoadStmt>()->parent;
  while (target != nullptr) {
    Stmt *parent_stmt = target->parent_stmt();
    if (parent_stmt == nullptr || parent_stmt->is<RangeForStmt>() ||
        parent_stmt->is<StructForStmt>() || parent_stmt->is<OffloadedStmt>() ||
        parent_stmt->is<MeshForStmt>()) {
      break;
    }
    target = parent_stmt->parent;
  }
  QD_ASSERT(target != nullptr);
  target->insert(std::move(alloca), 0);
}

Stop set: RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt, or the kernel root (parent_stmt == nullptr). IfStmt / WhileStmt are explicitly not in the stop set — those emit sibling reverse blocks, which is the whole point of walking past them.

Why the walk-up is the right shape

MakeAdjoint::visit(IfStmt) builds a fresh IfStmt in the reverse flow and visits the forward body's statements with current_block rebound to the new reverse branch. Statements in the forward branch are not live in the reverse branch's SSA scope — only values that dominate both branches (i.e. defined before the IfStmt) are. For the adjoint alloca to be readable from both the forward pass (which writes adj(w[i]) += ...) and the reverse pass (which accumulates into it), it must live in a block that both can see. That's exactly a block whose owner unconditionally dominates the enclosed code.

Loops (RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt) qualify because their reverse emission wraps the reversed body in a sibling loop that keeps the body's control flow intact — the accumulator needs to live just outside that wrapper, not inside it, which is what the walk-up produces.

The QD_ASSERT(target != nullptr) is a defensive guard: reaching nullptr means the primal's enclosing-block chain is broken (an unparented block), which should never happen on well-formed IR. The previous fallback of "try alloca_block" silently restored the pre-fix buggy placement whenever the chain was malformed; asserting surfaces it loudly.

Changes

`quadrants/transforms/auto_diff.cpp`

MakeAdjoint::visit(GlobalLoadStmt) (around the adjoint alloca placement). The condition for "is this load outside the currently-processed range-for?" moves from parent->parent_stmt()->is<RangeForStmt>() to forward_backup->locate(stmt) == -1, and the placement logic walks up as described above. Comment spells out the SSA-dominance invariant, enumerates the dominating stop set, and explicitly excludes IfStmt with a one-line reason.

`tests/python/test_ad_if.py`

Four new tests, all gated on qd.extension.adstack:

test_ad_nested_if_inside_for_loop (f32) — the regression pin. Nested if / if inside a for i in w: with inputs that deliberately straddle both conditions: some elements fail the outer guard, some fail the inner guard, some pass both. w.grad[i] must be 2*w[i] only for the passes-both elements and exactly 0 otherwise (catches spurious accumulation from untaken branches too).
test_ad_nested_if_inside_for_loop_f64 — same shape with f64 field and default_fp=qd.f64.
test_ad_nested_if_elif_else_inside_for_loop — if / elif / elif / else inside a for i in w:, four branches with different weights. Pins the walk-up for IR shapes where the nested construct is an if-chain rather than two stacked ifs.
test_ad_nested_for_loops_global_load — the for i in x: a = x[i]; for _ in range(n_inner): y += a shape. The GlobalLoadStmt lives in the outer for's body, and the alloca must be hoisted above the inner for (not placed inside it). Without the walk-up the inner reversed body runs the accumulator n_inner times per outer iteration, producing grad = n_inner² instead of the correct n_inner.

A fifth test (test_ad_nested_if_inside_while_loop) is included as xfail(strict=True, raises=RuntimeError) — reverse-mode AD does not yet have a visit(WhileStmt) implementation, but the IR shape (while wrapping nested ifs wrapping a field read) is the one the walk-up fix needs to hold on when visit(WhileStmt) ever lands.

Side-effect audit

Concern	Verdict
Existing single-level `if`-in-for gradients	Unchanged — old path remains correct as a special case of the new walk-up.
Struct-for / mesh-for / offloaded	Covered by the stop set.
Malformed IR	`QD_ASSERT(target != nullptr)` surfaces it loudly instead of silently degrading.
While loops	Hard-errors via existing `QD_NOT_IMPLEMENTED` in `visit(WhileStmt)` (covered by the xfail test).

Stack

Autodiff 5 of 13. Based on #503 (rsqrt). Followed by #491 (regression tests).

claude

The fix is logical and well-explained, but this touches correctness-critical autodiff alloca placement logic with subtle SSA dominator reasoning — worth a human look before landing.

Extended reasoning...

Overview

The PR modifies a single function (MakeAdjoint::adjoint) in auto_diff.cpp, changing how adjoint AllocaStmt nodes are placed when a GlobalLoadStmt lives outside the currently-processed range-for iteration body.

The Change

The old code checked whether the GlobalLoadStmt's direct parent block was owned by a RangeForStmt before deciding placement. This missed the case where the GlobalLoad was inside an if-body nested within the for-loop — in that case the old code fell through and placed the alloca inside the if-branch, which visit(IfStmt) cannot see from the sibling reverse IfStmt it emits, causing DCE to strip it and producing silently-zero gradients.

The new code walks up the block hierarchy from the GlobalLoad's parent until it reaches a block owned by a loop/offloaded/kernel statement (any unconditional dominator), skipping if/while bodies. This is the correct approach.

Security Risks

None. This is a compiler IR transformation pass with no surface area for injection, auth bypass, or data exposure.

Level of Scrutiny

High. Autodiff correctness is notoriously subtle — a placement bug like this produces wrong numerical results rather than crashes, making it hard to detect in testing. The fix is small but the reasoning touches SSA dominance, which is easy to get subtly wrong. The test plan checkbox is unchecked, so it is unclear whether test_ad_if_nested was confirmed to pass.

Other Factors

The pre-existing null-pointer dereference flagged in visit(GlobalStoreStmt) is unrelated to this PR. The new fallback if (target == nullptr) { target = alloca_block; } is a safe catch-all. The omission of WhileStmt from the break condition is benign since WhileStmt is rejected earlier in the pipeline.

hughperkins · 2026-04-17T13:17:41Z

Opus summary:

Summary

Fixes the placement of adjoint alloca statements for GlobalLoadStmts that live inside
if (or while) bodies within a differentiated for-loop. Previously, the alloca was
placed in the forward if-body's block, but the reverse pass emits backward code into a
brand-new sibling IfStmt — so the alloca was not visible from the reverse branch's
SSA scope, got DCE'd, and the gradient silently came out as zero.
The fix replaces the old single-level parent check with a walk up the block tree,
stopping at a block whose owning statement unconditionally dominates both forward and
reverse code (a RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or
the kernel root). This ensures the alloca is visible to both the forward and reverse
branches regardless of nesting depth.

Changes

MakeAdjoint alloca placement: Replaces the previous check
(parent->parent_stmt()->is<RangeForStmt>()) with an upward walk that skips if/while
blocks and stops at a dominating loop or kernel body.
Comment update: Clarifies "not inside a range-for" → "not inside the
currently-processed range-for" and adds a detailed explanation of the SSA dominance
issue.
New test (test_ad_nested_if_inside_for_loop): A kernel that reads w[i] inside
nested if bodies within a for-loop and checks that w.grad[i] is correctly 2 * w[i]
rather than silently zero.

Strengths

Fixes a silent correctness bug. Gradients came out as zero with no error or
warning. This is the worst class of bug — the test that catches it is valuable.
Generalises the fix. The upward walk handles arbitrary nesting of if/while
inside loops, not just the single-level case. This is more robust than patching only
the immediate failure pattern.
Good dominator set. Stopping at RangeForStmt, StructForStmt, OffloadedStmt,
and MeshForStmt covers all the loop/offload constructs that unconditionally dominate
their body. The nullptr parent fallback ensures the walk always terminates.
Clear, targeted test. The nested-if kernel directly exercises the bug and the
expected gradient (2 * w[i]) is trivially verifiable by hand.
Well-commented. The new inline comment explains the SSA dominance issue and
references the test by name.

Weaknesses / things to consider

No while test. The comment mentions that visit(IfStmt) emits into a sibling,
and the walk skips non-dominating statements generally, but the test only covers
nested if. A while body inside a for-loop would exercise the same placement logic
on a different control-flow construct.
Only tests the w[i] > 0 / w[i] < 10 path. All 8 elements satisfy both
conditions (values 1–8), so the else branch is never taken. A test where some elements
fail the condition would verify that gradients are correctly zero for those elements
(no spurious accumulation from the untaken branch).
MeshForStmt in the stop set is untested. If mesh-for constructs have different
reverse-pass emission semantics, stopping there could be wrong. Presumably it's
analogous to the other for-loops, but there's no test coverage for it.
Tolerance is rel=1e-4 for an exact integer computation (2 * (i+1) from
w[i] * w[i] with small integer inputs). Could be an exact == comparison or at
least rel=1e-6 to be tighter.

hughperkins · 2026-04-17T13:21:38Z

Please address these opus points:

No while test. The comment mentions that visit(IfStmt) emits into a sibling,
and the walk skips non-dominating statements generally, but the test only covers
nested if. A while body inside a for-loop would exercise the same placement logic
on a different control-flow construct.
Only tests the w[i] > 0 / w[i] < 10 path. All 8 elements satisfy both
conditions (values 1–8), so the else branch is never taken. A test where some elements
fail the condition would verify that gradients are correctly zero for those elements
(no spurious accumulation from the untaken branch).
Tolerance is rel=1e-4 for an exact integer computation (2 * (i+1) from
w[i] * w[i] with small integer inputs). Could be an exact == comparison or at
least rel=1e-6 to be tighter.

I'm fine with not handling mesh for. I wonder if we should remove mesh for? (I'm not susggesting removing mesh for in this pr, nor am I suggesting that you create a PR to remove mesh for; but I am pondering whether we should remove it)

hughperkins · 2026-04-17T13:22:12Z

+
+@test_utils.test()
+def test_ad_nested_if_inside_for_loop():
+    # Regression test for the adjoint-alloca placement fix. The forward kernel reads `w[i]`


Lets have a more plain-english explanation too pelase, ie that is understandable using python concepts.

Plain english version. Place before or after the detailed version please:

# Regression test for adjoint-alloca placement when a field read (`w[i]`) appears # inside nested `if` blocks within a for-loop being differentiated. Before the fix, # the gradient accumulator for `w[i]` was placed inside the forward `if` body, but # the reverse pass generates its backward code in a separate, parallel `if` block # that can't see variables defined in the forward one. The accumulator was silently # eliminated as dead code, and `w.grad[i]` came out as zero instead of the correct # `2 * w[i]`.

duburcqa · 2026-04-17T16:52:23Z

@claude review

claude

LGTM — both prior review comments addressed.

Extended reasoning...

Overview

The PR makes a targeted fix to MakeAdjoint::adjoint() in quadrants/transforms/auto_diff.cpp: the old single-level parent-is-RangeForStmt check is replaced by a walk-up loop that stops at any dominating block (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root). A new regression test and an xfail while-loop variant are added to tests/python/test_ad_if.py.

Security risks

None. This is a compiler IR transformation with no user-facing data handling, auth, or I/O surface.

Level of scrutiny

The core logic change is small and well-contained. The walk-up terminates unconditionally (nullptr fallback), the stop-set covers all dominating statement kinds, and the new test exercises both the passing (0 < v < 10) and failing conditions with exact-comparison assertions. Previous review cycles caught and resolved a wrong test-name reference and a missing adstack guard; both are now fixed.

Other factors

No bugs found by the automated bug-hunting pass. All inline comments from prior review rounds are marked resolved. Human reviewer (hughperkins) requested improvements that have been incorporated.

claude

LGTM — both previously raised issues are resolved, no new bugs found.

Extended reasoning...

Overview

The PR fixes a correctness bug in in : when a sits inside a nested (or ) body within the range-for being differentiated, the old single-level parent check left the adjoint alloca inside the forward if-body, which is SSA-invalid from the reverse sibling IfStmt and gets DCE'd to silently-zero gradients. The fix replaces the check with an upward walk that stops at a dominating loop/offload/kernel body. Two new tests are added to : a passing regression test for the range-for case and an xfail placeholder for the while-loop variant (reverse-mode AD does not yet implement ).

Security risks

None. This is a pure compiler-IR transformation pass with no I/O, authentication, or data-exposure surface.

Level of scrutiny

Moderate. The change touches a critical autodiff correctness path, but the logic is well-contained and the fix is a straightforward upward-walk replacement of a single-level parent check. The regression test directly reproduces the bug and verifies both the fixed case and the boundary (untaken-branch gradients are zero). The previously raised concerns — wrong test reference in the comment and missing guard — have both been addressed in the latest revision.

Other factors

No bugs were found by the bug hunting system in the current revision. All prior inline comments are resolved. The test uses exact comparisons (no floating-point tolerance needed for this integer-valued computation), includes boundary inputs that deliberately fail the conditions, and is guarded with matching every other reverse-mode AD test in the file.

claude

LGTM — both prior review items addressed; the walk-up alloca placement fix is correct and well-tested.

Extended reasoning...

Overview

The PR modifies two files: the core auto_diff.cpp alloca-placement logic in MakeAdjoint::adjoint(), and tests/python/test_ad_if.py with two new tests. The change replaces a single-level parent check with a walk-up loop that stops at SSA-dominating scopes (loop/offload bodies), fixing silently-zero gradients when a GlobalLoadStmt lives inside nested if/while blocks within a differentiated for-loop.

Security risks

None. This is purely an internal IR transformation pass with no external-facing surface.

Level of scrutiny

Moderate. The change is surgical — only the alloca insertion target changes — but it touches a correctness-critical path in the reverse-mode autodiff pipeline. The logic is sound: walking up until reaching a node that unconditionally dominates both forward and reverse code is the correct fix for the SSA-dominance problem described. The stop-set (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, nullptr) covers all unconditionally-dominating loop/kernel constructs.

Other factors

My previous review flagged two issues: (1) wrong test reference in the comment (test_ad_if_nested → test_ad_nested_if_inside_for_loop), and (2) missing require=qd.extension.adstack on the new test. Both are fixed in the current revision. The test now covers boundary inputs that fail one or both conditions (expecting gradient 0), which validates no spurious accumulation from untaken branches. The xfail while-loop companion test is a nice forward-looking placeholder.

hughperkins · 2026-04-20T12:54:33Z

Opus descrition update:

Summary

Fixes a silent reverse-mode AD bug where the gradient accumulator alloca for a GlobalLoadStmt was placed inside a forward if-body block whose SSA does not dominate the
reverse pass's use site. DCE then stripped the alloca and grad came out as zero for any field read nested inside ifs inside a differentiated for-loop.
The fix walks up from the load's enclosing block until it finds an unconditionally-dominating block (loop / offloaded / mesh-for / kernel root), and places the alloca there.
This is a different bug class from the rest of the chain — not a unary_collections membership issue, but an SSA-dominance issue in MakeAdjoint::AdjointPlacement.

What's in the PR

Net diff vs origin/duburcqa/split_autodiff_mark_rsqrt_nonlinear: 2 files, +106 / -12.

`quadrants/transforms/auto_diff.cpp`

MakeAdjoint's alloca-placement logic for GlobalLoadStmt:

Old: only handled the case where the load's immediate parent was a RangeForStmt. If the load lived inside if { if { w[i] } } inside a for-loop, it fell through to
alloca_block (the inner reversed loop body).
New: when the GlobalLoad lives outside the currently-processed range-for (forward_backup->locate(...) == -1), walk up the parent-block chain from the load's
enclosing block until hitting a block whose owning statement is one of RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt, or the kernel root. Insert the
alloca there.
Comment explains the invariant: IfStmt is not a valid stopping point because visit(IfStmt) emits the reverse code into a brand-new sibling IfStmt, so an alloca
placed in the forward branch is SSA-invalid from the reverse branch's point of view and gets DCE'd → silently-zero gradients.

`tests/python/test_ad_if.py`

Two new tests, gated on qd.extension.adstack:

test_ad_nested_if_inside_for_loop — regression pin. Builds the failing IR shape: for i: if w[i] > 0: if w[i] < 10: loss += w[i]*w[i]. Inputs deliberately straddle
both conditions (negative, in-range, and out-of-range positive) so the test asserts w.grad[i] == 2*w[i] where both conditions hold and == 0 elsewhere — catching both the
original silent-zero failure and any spurious accumulation from untaken branches.
test_ad_nested_if_inside_while_loop — same IR-shape regression but with a while wrapping the nested ifs. Marked xfail(strict=True, raises=RuntimeError) because
reverse-mode AD has no visit(WhileStmt) implementation today. Documents the gap and pins it as a known follow-up; if visit(WhileStmt) ever lands, the strict xfail flips
to XPASS and forces a reviewer decision.

Good points

Fixes a silent-correctness bug. Prior to this PR, gradients for field reads under nested ifs in a differentiated for-loop came back as zero with no error or warning.
Silent-wrong-gradient bugs are the worst kind — the test pins it concretely.
Identifies the correct invariant. The comment names the actual mechanism: "alloca must live in a block whose owning statement unconditionally dominates both the forward
and the reverse code." Lists the four block-owner types that satisfy it, and explicitly excludes IfStmt with a one-line explanation of why (visit(IfStmt) builds a sibling,
not a re-entry into the forward block). A future reader can extend the list confidently.
Walk-up-to-dominating-block is a proper structural fix, not a special-case for one nesting depth. Works for arbitrary if/if/.../for nesting.
Test asserts both directions of correctness. Not just "the right elements get the right gradient" but also "the wrong elements get exactly zero." This catches a
different class of regression — false accumulation from untaken branches — that would have been invisible to a pin that only checked the in-range elements.
Mixed-input parametrization (-2.0, 1.0, 2.0, …, 20.0) exercises the outer condition > 0, the inner condition < 10, and the both-true case in a single kernel run.
while-loop xfail is the right shape. strict=True, raises=RuntimeError documents both the gap and the current failure mode. If reverse-mode while support lands,
the test flips loudly; if the failure mode changes (e.g. compile error → silently wrong gradient), the test also fails, surfacing the regression.
Self-contained. Two-file diff, focused on a single bug class, with a regression test that didn't exist before. Easy to review.

Bad points / concerns

Walk-up loop has no termination guard for malformed IR. If the parent chain is broken (e.g. an unparented block) the loop falls out via target == nullptr and falls
back to alloca_block — which is the original buggy placement for this case. A defensive QD_ASSERT(target != nullptr) (or a clear log) might be safer than silently
degrading to the prior behavior.
Stopping-block whitelist is hand-curated. The set {RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt} could drift if a new control-flow stmt is added. Worth a
comment in the relevant statement classes pointing back here, or — better — a virtual is_unconditional_block_owner() on Stmt that the walk-up consults.
OffloadedStmt always terminates the walk. That's correct for the cases the test exercises (kernel-root placement), but worth confirming there isn't a configuration
where an alloca placed at the offloaded boundary is too coarse (e.g. shared between threads in a way that breaks parallel reductions). Not something the existing tests would
catch.
Test only covers if-inside-for nesting, not if-inside-if-inside-for with mixed branch shapes (else arms, if/elif/else). The fix is structural and should
handle them, but the regression pin doesn't prove it.
while xfail is also if-inside-while with a single-iteration while. The IR shape is the point, but a reader might miss that the trip count is irrelevant. The
docstring does say so explicitly, which mitigates this.
No coverage of nested-for (rather than nested-if). The original bug description mentions "GlobalLoadStmt outside the currently-processed range-for"; the fix's
walk-up logic should also handle a GlobalLoad declared in an outer for whose alloca needs to live above an inner for. Worth a third test case to pin that path
explicitly.
Single dtype (f32). Inherits the same f32-only limitation as the rest of the chain.
Merge commit, not rebase. Branch was brought up to date via git merge rather than rebased onto split_autodiff_mark_rsqrt_nonlinear. History contains a merge commit;
depending on repo conventions a clean rebase + force-push may be preferred.
Fifth PR in a stack on the same surface area. Reviewers need to merge the chain in order; any force-push to an upstream PR will require re-syncing this one. Not a code
issue, just a process note.

hughperkins · 2026-04-20T12:57:11Z

Please address:

Walk-up loop has no termination guard for malformed IR. If the parent chain is broken (e.g. an unparented block) the loop falls out via target == nullptr and falls
back to alloca_block — which is the original buggy placement for this case. A defensive QD_ASSERT(target != nullptr) (or a clear log) might be safer than silently
degrading to the prior behavior.
Test only covers if-inside-for nesting, not if-inside-if-inside-for with mixed branch shapes (else arms, if/elif/else). The fix is structural and should
handle them, but the regression pin doesn't prove it.
No coverage of nested-for (rather than nested-if). The original bug description mentions "GlobalLoadStmt outside the currently-processed range-for"; the fix's
walk-up logic should also handle a GlobalLoad declared in an outer for whose alloca needs to live above an inner for. Worth a third test case to pin that path
explicitly.
Single dtype (f32). Inherits the same f32-only limitation as the rest of the chain.

claude

LGTM — all prior feedback addressed, fix is correct and well-tested.

Extended reasoning...

Overview

The PR modifies two files: quadrants/transforms/auto_diff.cpp (the core alloca-placement fix in MakeAdjoint::adjoint()) and tests/python/test_ad_if.py (two new regression tests). The change replaces a single-level parent-check with an upward walk that stops at a block whose owning statement unconditionally dominates both forward and reverse code (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root), skipping if/while bodies that do not dominate the reverse-pass sibling IfStmt emitted by visit(IfStmt).

Security risks

None. This is an autodiff IR transform correctness fix with no security-sensitive surface (no auth, crypto, permissions, or external data boundaries).

Level of scrutiny

Moderate. The placement logic touches the core reverse-mode AD pass, but the change is surgical and well-reasoned: the walk terminates because it either hits a dominating statement or reaches nullptr, and the fallback to alloca_block when the walk exhausts the tree is safe. The regression test directly exercises the previously-broken case and includes elements that must produce zero gradient (values -2.0 and 20.0 outside the condition range), ruling out spurious backward accumulation.

Other factors

All feedback from my prior review was addressed: the wrong test reference (test_ad_if_nested) was corrected to test_ad_nested_if_inside_for_loop, and the missing require=qd.extension.adstack guard was added to match every other reverse-mode AD test in the file. The reviewer's requests (while-loop xfail test, boundary inputs, exact equality checks, plain-English comment) were all satisfied. No new bugs were flagged by the bug-hunting system.

claude

LGTM — all prior review comments addressed, walk-up fix and test coverage look correct.

Extended reasoning...

Overview

Two-file diff: a focused fix to MakeAdjoint::adjoint() in auto_diff.cpp and five new tests in tests/python/test_ad_if.py. The C++ change replaces a single-level parent->parent_stmt()->is<RangeForStmt>() check with a walk-up loop that stops at any unconditionally-dominating block owner (loop/offloaded/mesh-for/kernel root), ensuring the adjoint alloca is visible from both the forward and reverse SSA scopes regardless of nesting depth.

Security risks

None. This is a compiler IR transform for automatic differentiation with no security-sensitive surface area.

Level of scrutiny

Moderate — this touches the core reverse-mode AD transform which is non-trivial, but the change is surgical and well-bounded. The SSA-dominance invariant is clearly explained in the new comment, the stop-set is enumerated, and the QD_ASSERT(target \!= nullptr) guard surfaces malformed IR loudly rather than silently degrading. Four new tests cover f32/f64 variants, if/elif/else nesting, and nested-for shapes; an xfail test documents the known while-loop gap.

Other factors

All three rounds of review comments (comment reference fix, missing require=qd.extension.adstack, and the two April 21 inline comments about inaccurate IR descriptions in test_ad_nested_for_loops_global_load and test_ad_nested_if_elif_else_inside_for_loop) have been addressed in the current code. The bug hunting system found no bugs. The PR is part of a larger autodiff stack but the changes are self-contained.

…urrent range-for

claude

LGTM — all previous review points addressed, fix is correct.

Extended reasoning...

Overview

Two-file change: (alloca placement logic in ) and (four new regression tests plus one xfail). Fixes silently-zero gradients for field reads nested inside blocks within a differentiated for-loop.

Security risks

None. This is a compiler IR transform pass; no auth, crypto, or data exposure surface.

Level of scrutiny

Medium. The change is a focused bug fix in an autodiff pass — not trivial (IR manipulation, SSA-dominance reasoning), but the logic is well-explained and the stop-set is correct. The four new tests pin all the shapes the fix covers, and the replaces a silent-degradation fallback with a loud failure for malformed IR. All three previous review rounds had their feedback fully addressed.

Other factors

All inline comments from my two prior review runs are marked resolved and confirmed fixed in the diff: correct test reference, guard, corrected IR-structure description in the elif test comment, and accurate documentation that exercises Case 2 rather than the walk-up. No outstanding reviewer comments remain.

duburcqa mentioned this pull request Apr 17, 2026

[AutoDiff] Autodiff 6: Adstack regression tests #491

Open

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 4c048e8 to d8e05eb Compare April 17, 2026 11:43

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread quadrants/transforms/auto_diff.cpp

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from d8e05eb to b9d107a Compare April 17, 2026 12:12

duburcqa mentioned this pull request Apr 17, 2026

[AutoDiff] Autodiff 1: Add baseline adstack regression test for unary_collections #500

Open

duburcqa changed the base branch from main to duburcqa/split_autodiff_mark_rsqrt_nonlinear April 17, 2026 12:14

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 0f7ff3b to 56c457e Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from b9d107a to 0c487e2 Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 56c457e to 8fd18b0 Compare April 17, 2026 12:31

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 0c487e2 to 7f9c702 Compare April 17, 2026 12:31

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 8fd18b0 to d813066 Compare April 17, 2026 12:42

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 7f9c702 to a5f664f Compare April 17, 2026 12:42

duburcqa mentioned this pull request Apr 17, 2026

[Bug]: loss.backward() hangs indefinitely for articulated robots with freejoint + child joints Genesis-Embodied-AI/Genesis#2537

Open

hughperkins changed the title ~~[AutoDiff] Fix adjoint-alloca placement for GlobalLoads outside the current range-for~~ [AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for Apr 17, 2026

hughperkins reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from d813066 to ce8e91b Compare April 17, 2026 15:57

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from a5f664f to d9d2d09 Compare April 17, 2026 15:57

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread quadrants/transforms/auto_diff.cpp Outdated

Comment thread tests/python/test_ad_if.py Outdated

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from ce8e91b to 7d5ce3c Compare April 17, 2026 20:30

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from d9d2d09 to da7fb7a Compare April 17, 2026 20:31

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 7d5ce3c to 94e5650 Compare April 17, 2026 21:36

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from da7fb7a to 1bb6c33 Compare April 17, 2026 21:36

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 94e5650 to 421d30e Compare April 17, 2026 22:33

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 1bb6c33 to 2bfdfc5 Compare April 17, 2026 22:34

claude bot reviewed Apr 17, 2026

View reviewed changes

claude bot reviewed Apr 20, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from c27ceb3 to 121da95 Compare April 21, 2026 06:40

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 1c90048 to 6e48d59 Compare April 21, 2026 06:40

duburcqa mentioned this pull request Apr 21, 2026

[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion #503

Open

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 121da95 to 783f8a3 Compare April 21, 2026 06:59

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 6e48d59 to 4991012 Compare April 21, 2026 06:59

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 783f8a3 to 19251b8 Compare April 21, 2026 07:19

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 4991012 to 0900c72 Compare April 21, 2026 07:19

claude bot reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/python/test_ad_if.py

Comment thread tests/python/test_ad_if.py

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 19251b8 to 6b46265 Compare April 21, 2026 08:18

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 0900c72 to 82ad225 Compare April 21, 2026 08:18

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 6b46265 to 0df2b1d Compare April 21, 2026 09:50

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 82ad225 to 77f3a07 Compare April 21, 2026 09:50

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 0df2b1d to ece6b34 Compare April 21, 2026 12:02

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 77f3a07 to 46d1da3 Compare April 21, 2026 12:02

claude bot reviewed Apr 21, 2026

View reviewed changes

[AutoDiff] Fix adjoint-alloca placement for GlobalLoads outside the c…

ce478a5

…urrent range-for

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 46d1da3 to ce478a5 Compare April 21, 2026 14:42

claude bot reviewed Apr 21, 2026

View reviewed changes

Conversation

duburcqa commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix adjoint-alloca placement for GlobalLoads outside the current range-for

TL;DR

Why the walk-up is the right shape

Changes

quadrants/transforms/auto_diff.cpp

tests/python/test_ad_if.py

Side-effect audit

Stack

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

The Change

Security Risks

Level of Scrutiny

Other Factors

Uh oh!

Uh oh!

hughperkins commented Apr 17, 2026

Summary

Changes

Strengths

Weaknesses / things to consider

Uh oh!

hughperkins commented Apr 17, 2026

Uh oh!

hughperkins Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

hughperkins Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duburcqa commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

hughperkins commented Apr 20, 2026

Summary

What's in the PR

quadrants/transforms/auto_diff.cpp

tests/python/test_ad_if.py

Good points

Bad points / concerns

Uh oh!

hughperkins commented Apr 20, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

duburcqa commented Apr 17, 2026 •

edited

Loading

Fix adjoint-alloca placement for `GlobalLoad`s outside the current range-for

`quadrants/transforms/auto_diff.cpp`

`tests/python/test_ad_if.py`

hughperkins Apr 17, 2026 •

edited

Loading

`quadrants/transforms/auto_diff.cpp`

`tests/python/test_ad_if.py`