Skip to content

[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for#496

Open
duburcqa wants to merge 1 commit intoduburcqa/split_autodiff_mark_rsqrt_nonlinearfrom
duburcqa/split_adjoint_alloca_placement
Open

[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for#496
duburcqa wants to merge 1 commit intoduburcqa/split_autodiff_mark_rsqrt_nonlinearfrom
duburcqa/split_adjoint_alloca_placement

Conversation

@duburcqa
Copy link
Copy Markdown
Contributor

@duburcqa duburcqa commented Apr 17, 2026

Fix adjoint-alloca placement for GlobalLoads outside the current range-for

Silently-zero gradients on any reverse-mode kernel that reads a field (w[i]) inside a nested control-flow construct (another if, another for, …) inside a for-loop. Fix walks up the enclosing-block chain until reaching a block whose owning statement unconditionally dominates both forward and reverse code.

TL;DR

Before:

if (stmt->is<GlobalLoadStmt>() && stmt->parent->parent_stmt() != nullptr &&
    stmt->parent->parent_stmt()->is<RangeForStmt>()) {
  // ...single-level check: only covered a load whose IMMEDIATE parent block
  // was the range-for's body.
}

The check fired only when the GlobalLoad's immediate parent was a RangeForStmt. When the load lived inside if { if { w[i] } } inside a for-loop, the check failed and the alloca got placed inside the forward if-body's block. MakeAdjoint::visit(IfStmt) emits the reverse code into a brand-new sibling IfStmt (not back into the forward body), so the alloca was SSA-invalid from the reverse branch's point of view and got DCE'd. Result: w.grad[i] came out silently zero instead of the correct 2 * w[i].

After:

if (stmt->is<GlobalLoadStmt>() && forward_backup->locate(stmt->as<GlobalLoadStmt>()) == -1) {
  // Walk up the enclosing-block chain until we find a block whose owning statement
  // unconditionally dominates both forward and reverse code.
  Block *target = stmt->as<GlobalLoadStmt>()->parent;
  while (target != nullptr) {
    Stmt *parent_stmt = target->parent_stmt();
    if (parent_stmt == nullptr || parent_stmt->is<RangeForStmt>() ||
        parent_stmt->is<StructForStmt>() || parent_stmt->is<OffloadedStmt>() ||
        parent_stmt->is<MeshForStmt>()) {
      break;
    }
    target = parent_stmt->parent;
  }
  QD_ASSERT(target != nullptr);
  target->insert(std::move(alloca), 0);
}

Stop set: RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt, or the kernel root (parent_stmt == nullptr). IfStmt / WhileStmt are explicitly not in the stop set — those emit sibling reverse blocks, which is the whole point of walking past them.

Why the walk-up is the right shape

MakeAdjoint::visit(IfStmt) builds a fresh IfStmt in the reverse flow and visits the forward body's statements with current_block rebound to the new reverse branch. Statements in the forward branch are not live in the reverse branch's SSA scope — only values that dominate both branches (i.e. defined before the IfStmt) are. For the adjoint alloca to be readable from both the forward pass (which writes adj(w[i]) += ...) and the reverse pass (which accumulates into it), it must live in a block that both can see. That's exactly a block whose owner unconditionally dominates the enclosed code.

Loops (RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt) qualify because their reverse emission wraps the reversed body in a sibling loop that keeps the body's control flow intact — the accumulator needs to live just outside that wrapper, not inside it, which is what the walk-up produces.

The QD_ASSERT(target != nullptr) is a defensive guard: reaching nullptr means the primal's enclosing-block chain is broken (an unparented block), which should never happen on well-formed IR. The previous fallback of "try alloca_block" silently restored the pre-fix buggy placement whenever the chain was malformed; asserting surfaces it loudly.

Changes

quadrants/transforms/auto_diff.cpp

MakeAdjoint::visit(GlobalLoadStmt) (around the adjoint alloca placement). The condition for "is this load outside the currently-processed range-for?" moves from parent->parent_stmt()->is<RangeForStmt>() to forward_backup->locate(stmt) == -1, and the placement logic walks up as described above. Comment spells out the SSA-dominance invariant, enumerates the dominating stop set, and explicitly excludes IfStmt with a one-line reason.

tests/python/test_ad_if.py

Four new tests, all gated on qd.extension.adstack:

  1. test_ad_nested_if_inside_for_loop (f32) — the regression pin. Nested if / if inside a for i in w: with inputs that deliberately straddle both conditions: some elements fail the outer guard, some fail the inner guard, some pass both. w.grad[i] must be 2*w[i] only for the passes-both elements and exactly 0 otherwise (catches spurious accumulation from untaken branches too).
  2. test_ad_nested_if_inside_for_loop_f64 — same shape with f64 field and default_fp=qd.f64.
  3. test_ad_nested_if_elif_else_inside_for_loopif / elif / elif / else inside a for i in w:, four branches with different weights. Pins the walk-up for IR shapes where the nested construct is an if-chain rather than two stacked ifs.
  4. test_ad_nested_for_loops_global_load — the for i in x: a = x[i]; for _ in range(n_inner): y += a shape. The GlobalLoadStmt lives in the outer for's body, and the alloca must be hoisted above the inner for (not placed inside it). Without the walk-up the inner reversed body runs the accumulator n_inner times per outer iteration, producing grad = n_inner² instead of the correct n_inner.

A fifth test (test_ad_nested_if_inside_while_loop) is included as xfail(strict=True, raises=RuntimeError) — reverse-mode AD does not yet have a visit(WhileStmt) implementation, but the IR shape (while wrapping nested ifs wrapping a field read) is the one the walk-up fix needs to hold on when visit(WhileStmt) ever lands.

Side-effect audit

Concern Verdict
Existing single-level if-in-for gradients Unchanged — old path remains correct as a special case of the new walk-up.
Struct-for / mesh-for / offloaded Covered by the stop set.
Malformed IR QD_ASSERT(target != nullptr) surfaces it loudly instead of silently degrading.
While loops Hard-errors via existing QD_NOT_IMPLEMENTED in visit(WhileStmt) (covered by the xfail test).

Stack

Autodiff 5 of 13. Based on #503 (rsqrt). Followed by #491 (regression tests).

@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 4c048e8 to d8e05eb Compare April 17, 2026 11:43
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is logical and well-explained, but this touches correctness-critical autodiff alloca placement logic with subtle SSA dominator reasoning — worth a human look before landing.

Extended reasoning...

Overview

The PR modifies a single function (MakeAdjoint::adjoint) in auto_diff.cpp, changing how adjoint AllocaStmt nodes are placed when a GlobalLoadStmt lives outside the currently-processed range-for iteration body.

The Change

The old code checked whether the GlobalLoadStmt's direct parent block was owned by a RangeForStmt before deciding placement. This missed the case where the GlobalLoad was inside an if-body nested within the for-loop — in that case the old code fell through and placed the alloca inside the if-branch, which visit(IfStmt) cannot see from the sibling reverse IfStmt it emits, causing DCE to strip it and producing silently-zero gradients.

The new code walks up the block hierarchy from the GlobalLoad's parent until it reaches a block owned by a loop/offloaded/kernel statement (any unconditional dominator), skipping if/while bodies. This is the correct approach.

Security Risks

None. This is a compiler IR transformation pass with no surface area for injection, auth bypass, or data exposure.

Level of Scrutiny

High. Autodiff correctness is notoriously subtle — a placement bug like this produces wrong numerical results rather than crashes, making it hard to detect in testing. The fix is small but the reasoning touches SSA dominance, which is easy to get subtly wrong. The test plan checkbox is unchecked, so it is unclear whether test_ad_if_nested was confirmed to pass.

Other Factors

The pre-existing null-pointer dereference flagged in visit(GlobalStoreStmt) is unrelated to this PR. The new fallback if (target == nullptr) { target = alloca_block; } is a safe catch-all. The omission of WhileStmt from the break condition is benign since WhileStmt is rejected earlier in the pipeline.

Comment thread quadrants/transforms/auto_diff.cpp
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from d8e05eb to b9d107a Compare April 17, 2026 12:12
@duburcqa duburcqa changed the base branch from main to duburcqa/split_autodiff_mark_rsqrt_nonlinear April 17, 2026 12:14
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 0f7ff3b to 56c457e Compare April 17, 2026 12:29
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from b9d107a to 0c487e2 Compare April 17, 2026 12:29
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 56c457e to 8fd18b0 Compare April 17, 2026 12:31
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 0c487e2 to 7f9c702 Compare April 17, 2026 12:31
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 8fd18b0 to d813066 Compare April 17, 2026 12:42
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 7f9c702 to a5f664f Compare April 17, 2026 12:42
@hughperkins hughperkins changed the title [AutoDiff] Fix adjoint-alloca placement for GlobalLoads outside the current range-for [AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for Apr 17, 2026
@hughperkins
Copy link
Copy Markdown
Collaborator

Opus summary:

Summary

Fixes the placement of adjoint alloca statements for GlobalLoadStmts that live inside
if (or while) bodies within a differentiated for-loop. Previously, the alloca was
placed in the forward if-body's block, but the reverse pass emits backward code into a
brand-new sibling IfStmt — so the alloca was not visible from the reverse branch's
SSA scope, got DCE'd, and the gradient silently came out as zero.
The fix replaces the old single-level parent check with a walk up the block tree,
stopping at a block whose owning statement unconditionally dominates both forward and
reverse code (a RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or
the kernel root). This ensures the alloca is visible to both the forward and reverse
branches regardless of nesting depth.

Changes

  • MakeAdjoint alloca placement: Replaces the previous check
    (parent->parent_stmt()->is<RangeForStmt>()) with an upward walk that skips if/while
    blocks and stops at a dominating loop or kernel body.
  • Comment update: Clarifies "not inside a range-for" → "not inside the
    currently-processed range-for" and adds a detailed explanation of the SSA dominance
    issue.
  • New test (test_ad_nested_if_inside_for_loop): A kernel that reads w[i] inside
    nested if bodies within a for-loop and checks that w.grad[i] is correctly 2 * w[i]
    rather than silently zero.

Strengths

  • Fixes a silent correctness bug. Gradients came out as zero with no error or
    warning. This is the worst class of bug — the test that catches it is valuable.
  • Generalises the fix. The upward walk handles arbitrary nesting of if/while
    inside loops, not just the single-level case. This is more robust than patching only
    the immediate failure pattern.
  • Good dominator set. Stopping at RangeForStmt, StructForStmt, OffloadedStmt,
    and MeshForStmt covers all the loop/offload constructs that unconditionally dominate
    their body. The nullptr parent fallback ensures the walk always terminates.
  • Clear, targeted test. The nested-if kernel directly exercises the bug and the
    expected gradient (2 * w[i]) is trivially verifiable by hand.
  • Well-commented. The new inline comment explains the SSA dominance issue and
    references the test by name.

Weaknesses / things to consider

  • No while test. The comment mentions that visit(IfStmt) emits into a sibling,
    and the walk skips non-dominating statements generally, but the test only covers
    nested if. A while body inside a for-loop would exercise the same placement logic
    on a different control-flow construct.
  • Only tests the w[i] > 0 / w[i] < 10 path. All 8 elements satisfy both
    conditions (values 1–8), so the else branch is never taken. A test where some elements
    fail the condition would verify that gradients are correctly zero for those elements
    (no spurious accumulation from the untaken branch).
  • MeshForStmt in the stop set is untested. If mesh-for constructs have different
    reverse-pass emission semantics, stopping there could be wrong. Presumably it's
    analogous to the other for-loops, but there's no test coverage for it.
  • Tolerance is rel=1e-4 for an exact integer computation (2 * (i+1) from
    w[i] * w[i] with small integer inputs). Could be an exact == comparison or at
    least rel=1e-6 to be tighter.

@hughperkins
Copy link
Copy Markdown
Collaborator

Please address these opus points:

  • No while test. The comment mentions that visit(IfStmt) emits into a sibling,
    and the walk skips non-dominating statements generally, but the test only covers
    nested if. A while body inside a for-loop would exercise the same placement logic
    on a different control-flow construct.
  • Only tests the w[i] > 0 / w[i] < 10 path. All 8 elements satisfy both
    conditions (values 1–8), so the else branch is never taken. A test where some elements
    fail the condition would verify that gradients are correctly zero for those elements
    (no spurious accumulation from the untaken branch).
  • Tolerance is rel=1e-4 for an exact integer computation (2 * (i+1) from
    w[i] * w[i] with small integer inputs). Could be an exact == comparison or at
    least rel=1e-6 to be tighter.

I'm fine with not handling mesh for. I wonder if we should remove mesh for? (I'm not susggesting removing mesh for in this pr, nor am I suggesting that you create a PR to remove mesh for; but I am pondering whether we should remove it)

Comment thread tests/python/test_ad_if.py Outdated

@test_utils.test()
def test_ad_nested_if_inside_for_loop():
# Regression test for the adjoint-alloca placement fix. The forward kernel reads `w[i]`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a more plain-english explanation too pelase, ie that is understandable using python concepts.

Copy link
Copy Markdown
Collaborator

@hughperkins hughperkins Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plain english version. Place before or after the detailed version please:

      # Regression test for adjoint-alloca placement when a field read (`w[i]`) appears
      # inside nested `if` blocks within a for-loop being differentiated. Before the fix,
      # the gradient accumulator for `w[i]` was placed inside the forward `if` body, but
      # the reverse pass generates its backward code in a separate, parallel `if` block
      # that can't see variables defined in the forward one. The accumulator was silently
      # eliminated as dead code, and `w.grad[i]` came out as zero instead of the correct
      # `2 * w[i]`.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from d813066 to ce8e91b Compare April 17, 2026 15:57
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from a5f664f to d9d2d09 Compare April 17, 2026 15:57
@duburcqa
Copy link
Copy Markdown
Contributor Author

@claude review

Comment thread quadrants/transforms/auto_diff.cpp Outdated
Comment thread tests/python/test_ad_if.py Outdated
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from ce8e91b to 7d5ce3c Compare April 17, 2026 20:30
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from d9d2d09 to da7fb7a Compare April 17, 2026 20:31
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review comments addressed.

Extended reasoning...

Overview

The PR makes a targeted fix to MakeAdjoint::adjoint() in quadrants/transforms/auto_diff.cpp: the old single-level parent-is-RangeForStmt check is replaced by a walk-up loop that stops at any dominating block (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root). A new regression test and an xfail while-loop variant are added to tests/python/test_ad_if.py.

Security risks

None. This is a compiler IR transformation with no user-facing data handling, auth, or I/O surface.

Level of scrutiny

The core logic change is small and well-contained. The walk-up terminates unconditionally (nullptr fallback), the stop-set covers all dominating statement kinds, and the new test exercises both the passing (0 < v < 10) and failing conditions with exact-comparison assertions. Previous review cycles caught and resolved a wrong test-name reference and a missing adstack guard; both are now fixed.

Other factors

No bugs found by the automated bug-hunting pass. All inline comments from prior review rounds are marked resolved. Human reviewer (hughperkins) requested improvements that have been incorporated.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 7d5ce3c to 94e5650 Compare April 17, 2026 21:36
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from da7fb7a to 1bb6c33 Compare April 17, 2026 21:36
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both previously raised issues are resolved, no new bugs found.

Extended reasoning...

Overview

The PR fixes a correctness bug in in : when a sits inside a nested (or ) body within the range-for being differentiated, the old single-level parent check left the adjoint alloca inside the forward if-body, which is SSA-invalid from the reverse sibling IfStmt and gets DCE'd to silently-zero gradients. The fix replaces the check with an upward walk that stops at a dominating loop/offload/kernel body. Two new tests are added to : a passing regression test for the range-for case and an xfail placeholder for the while-loop variant (reverse-mode AD does not yet implement ).

Security risks

None. This is a pure compiler-IR transformation pass with no I/O, authentication, or data-exposure surface.

Level of scrutiny

Moderate. The change touches a critical autodiff correctness path, but the logic is well-contained and the fix is a straightforward upward-walk replacement of a single-level parent check. The regression test directly reproduces the bug and verifies both the fixed case and the boundary (untaken-branch gradients are zero). The previously raised concerns — wrong test reference in the comment and missing guard — have both been addressed in the latest revision.

Other factors

No bugs were found by the bug hunting system in the current revision. All prior inline comments are resolved. The test uses exact comparisons (no floating-point tolerance needed for this integer-valued computation), includes boundary inputs that deliberately fail the conditions, and is guarded with matching every other reverse-mode AD test in the file.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 94e5650 to 421d30e Compare April 17, 2026 22:33
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 1bb6c33 to 2bfdfc5 Compare April 17, 2026 22:34
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review items addressed; the walk-up alloca placement fix is correct and well-tested.

Extended reasoning...

Overview

The PR modifies two files: the core auto_diff.cpp alloca-placement logic in MakeAdjoint::adjoint(), and tests/python/test_ad_if.py with two new tests. The change replaces a single-level parent check with a walk-up loop that stops at SSA-dominating scopes (loop/offload bodies), fixing silently-zero gradients when a GlobalLoadStmt lives inside nested if/while blocks within a differentiated for-loop.

Security risks

None. This is purely an internal IR transformation pass with no external-facing surface.

Level of scrutiny

Moderate. The change is surgical — only the alloca insertion target changes — but it touches a correctness-critical path in the reverse-mode autodiff pipeline. The logic is sound: walking up until reaching a node that unconditionally dominates both forward and reverse code is the correct fix for the SSA-dominance problem described. The stop-set (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, nullptr) covers all unconditionally-dominating loop/kernel constructs.

Other factors

My previous review flagged two issues: (1) wrong test reference in the comment (test_ad_if_nestedtest_ad_nested_if_inside_for_loop), and (2) missing require=qd.extension.adstack on the new test. Both are fixed in the current revision. The test now covers boundary inputs that fail one or both conditions (expecting gradient 0), which validates no spurious accumulation from untaken branches. The xfail while-loop companion test is a nice forward-looking placeholder.

@hughperkins
Copy link
Copy Markdown
Collaborator

Opus descrition update:

Summary

Fixes a silent reverse-mode AD bug where the gradient accumulator alloca for a GlobalLoadStmt was placed inside a forward if-body block whose SSA does not dominate the
reverse pass's use site. DCE then stripped the alloca and grad came out as zero for any field read nested inside ifs inside a differentiated for-loop.
The fix walks up from the load's enclosing block until it finds an unconditionally-dominating block (loop / offloaded / mesh-for / kernel root), and places the alloca there.
This is a different bug class from the rest of the chain — not a unary_collections membership issue, but an SSA-dominance issue in MakeAdjoint::AdjointPlacement.

What's in the PR

Net diff vs origin/duburcqa/split_autodiff_mark_rsqrt_nonlinear: 2 files, +106 / -12.

quadrants/transforms/auto_diff.cpp

MakeAdjoint's alloca-placement logic for GlobalLoadStmt:

  • Old: only handled the case where the load's immediate parent was a RangeForStmt. If the load lived inside if { if { w[i] } } inside a for-loop, it fell through to
    alloca_block (the inner reversed loop body).
  • New: when the GlobalLoad lives outside the currently-processed range-for (forward_backup->locate(...) == -1), walk up the parent-block chain from the load's
    enclosing block until hitting a block whose owning statement is one of RangeForStmt / StructForStmt / OffloadedStmt / MeshForStmt, or the kernel root. Insert the
    alloca there.
  • Comment explains the invariant: IfStmt is not a valid stopping point because visit(IfStmt) emits the reverse code into a brand-new sibling IfStmt, so an alloca
    placed in the forward branch is SSA-invalid from the reverse branch's point of view and gets DCE'd → silently-zero gradients.

tests/python/test_ad_if.py

Two new tests, gated on qd.extension.adstack:

  • test_ad_nested_if_inside_for_loop — regression pin. Builds the failing IR shape: for i: if w[i] > 0: if w[i] < 10: loss += w[i]*w[i]. Inputs deliberately straddle
    both conditions (negative, in-range, and out-of-range positive) so the test asserts w.grad[i] == 2*w[i] where both conditions hold and == 0 elsewhere — catching both the
    original silent-zero failure and any spurious accumulation from untaken branches.
  • test_ad_nested_if_inside_while_loop — same IR-shape regression but with a while wrapping the nested ifs. Marked xfail(strict=True, raises=RuntimeError) because
    reverse-mode AD has no visit(WhileStmt) implementation today. Documents the gap and pins it as a known follow-up; if visit(WhileStmt) ever lands, the strict xfail flips
    to XPASS and forces a reviewer decision.

Good points

  • Fixes a silent-correctness bug. Prior to this PR, gradients for field reads under nested ifs in a differentiated for-loop came back as zero with no error or warning.
    Silent-wrong-gradient bugs are the worst kind — the test pins it concretely.
  • Identifies the correct invariant. The comment names the actual mechanism: "alloca must live in a block whose owning statement unconditionally dominates both the forward
    and the reverse code." Lists the four block-owner types that satisfy it, and explicitly excludes IfStmt with a one-line explanation of why (visit(IfStmt) builds a sibling,
    not a re-entry into the forward block). A future reader can extend the list confidently.
  • Walk-up-to-dominating-block is a proper structural fix, not a special-case for one nesting depth. Works for arbitrary if/if/.../for nesting.
  • Test asserts both directions of correctness. Not just "the right elements get the right gradient" but also "the wrong elements get exactly zero." This catches a
    different class of regression — false accumulation from untaken branches — that would have been invisible to a pin that only checked the in-range elements.
  • Mixed-input parametrization (-2.0, 1.0, 2.0, …, 20.0) exercises the outer condition > 0, the inner condition < 10, and the both-true case in a single kernel run.
  • while-loop xfail is the right shape. strict=True, raises=RuntimeError documents both the gap and the current failure mode. If reverse-mode while support lands,
    the test flips loudly; if the failure mode changes (e.g. compile error → silently wrong gradient), the test also fails, surfacing the regression.
  • Self-contained. Two-file diff, focused on a single bug class, with a regression test that didn't exist before. Easy to review.

Bad points / concerns

  • Walk-up loop has no termination guard for malformed IR. If the parent chain is broken (e.g. an unparented block) the loop falls out via target == nullptr and falls
    back to alloca_block — which is the original buggy placement for this case. A defensive QD_ASSERT(target != nullptr) (or a clear log) might be safer than silently
    degrading to the prior behavior.
  • Stopping-block whitelist is hand-curated. The set {RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt} could drift if a new control-flow stmt is added. Worth a
    comment in the relevant statement classes pointing back here, or — better — a virtual is_unconditional_block_owner() on Stmt that the walk-up consults.
  • OffloadedStmt always terminates the walk. That's correct for the cases the test exercises (kernel-root placement), but worth confirming there isn't a configuration
    where an alloca placed at the offloaded boundary is too coarse (e.g. shared between threads in a way that breaks parallel reductions). Not something the existing tests would
    catch.
  • Test only covers if-inside-for nesting, not if-inside-if-inside-for with mixed branch shapes (else arms, if/elif/else). The fix is structural and should
    handle them, but the regression pin doesn't prove it.
  • while xfail is also if-inside-while with a single-iteration while. The IR shape is the point, but a reader might miss that the trip count is irrelevant. The
    docstring does say so explicitly, which mitigates this.
  • No coverage of nested-for (rather than nested-if). The original bug description mentions "GlobalLoadStmt outside the currently-processed range-for"; the fix's
    walk-up logic should also handle a GlobalLoad declared in an outer for whose alloca needs to live above an inner for. Worth a third test case to pin that path
    explicitly.
  • Single dtype (f32). Inherits the same f32-only limitation as the rest of the chain.
  • Merge commit, not rebase. Branch was brought up to date via git merge rather than rebased onto split_autodiff_mark_rsqrt_nonlinear. History contains a merge commit;
    depending on repo conventions a clean rebase + force-push may be preferred.
  • Fifth PR in a stack on the same surface area. Reviewers need to merge the chain in order; any force-push to an upstream PR will require re-syncing this one. Not a code
    issue, just a process note.

@hughperkins
Copy link
Copy Markdown
Collaborator

Please address:

  • Walk-up loop has no termination guard for malformed IR. If the parent chain is broken (e.g. an unparented block) the loop falls out via target == nullptr and falls
    back to alloca_block — which is the original buggy placement for this case. A defensive QD_ASSERT(target != nullptr) (or a clear log) might be safer than silently
    degrading to the prior behavior.
  • Test only covers if-inside-for nesting, not if-inside-if-inside-for with mixed branch shapes (else arms, if/elif/else). The fix is structural and should
    handle them, but the regression pin doesn't prove it.
  • No coverage of nested-for (rather than nested-if). The original bug description mentions "GlobalLoadStmt outside the currently-processed range-for"; the fix's
    walk-up logic should also handle a GlobalLoad declared in an outer for whose alloca needs to live above an inner for. Worth a third test case to pin that path
    explicitly.
  • Single dtype (f32). Inherits the same f32-only limitation as the rest of the chain.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior feedback addressed, fix is correct and well-tested.

Extended reasoning...

Overview

The PR modifies two files: quadrants/transforms/auto_diff.cpp (the core alloca-placement fix in MakeAdjoint::adjoint()) and tests/python/test_ad_if.py (two new regression tests). The change replaces a single-level parent-check with an upward walk that stops at a block whose owning statement unconditionally dominates both forward and reverse code (RangeForStmt, StructForStmt, OffloadedStmt, MeshForStmt, or kernel root), skipping if/while bodies that do not dominate the reverse-pass sibling IfStmt emitted by visit(IfStmt).

Security risks

None. This is an autodiff IR transform correctness fix with no security-sensitive surface (no auth, crypto, permissions, or external data boundaries).

Level of scrutiny

Moderate. The placement logic touches the core reverse-mode AD pass, but the change is surgical and well-reasoned: the walk terminates because it either hits a dominating statement or reaches nullptr, and the fallback to alloca_block when the walk exhausts the tree is safe. The regression test directly exercises the previously-broken case and includes elements that must produce zero gradient (values -2.0 and 20.0 outside the condition range), ruling out spurious backward accumulation.

Other factors

All feedback from my prior review was addressed: the wrong test reference (test_ad_if_nested) was corrected to test_ad_nested_if_inside_for_loop, and the missing require=qd.extension.adstack guard was added to match every other reverse-mode AD test in the file. The reviewer's requests (while-loop xfail test, boundary inputs, exact equality checks, plain-English comment) were all satisfied. No new bugs were flagged by the bug-hunting system.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from c27ceb3 to 121da95 Compare April 21, 2026 06:40
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 1c90048 to 6e48d59 Compare April 21, 2026 06:40
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 121da95 to 783f8a3 Compare April 21, 2026 06:59
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 6e48d59 to 4991012 Compare April 21, 2026 06:59
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 783f8a3 to 19251b8 Compare April 21, 2026 07:19
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 4991012 to 0900c72 Compare April 21, 2026 07:19
Comment thread tests/python/test_ad_if.py
Comment thread tests/python/test_ad_if.py
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 19251b8 to 6b46265 Compare April 21, 2026 08:18
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 0900c72 to 82ad225 Compare April 21, 2026 08:18
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 6b46265 to 0df2b1d Compare April 21, 2026 09:50
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 82ad225 to 77f3a07 Compare April 21, 2026 09:50
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 0df2b1d to ece6b34 Compare April 21, 2026 12:02
@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 77f3a07 to 46d1da3 Compare April 21, 2026 12:02
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior review comments addressed, walk-up fix and test coverage look correct.

Extended reasoning...

Overview

Two-file diff: a focused fix to MakeAdjoint::adjoint() in auto_diff.cpp and five new tests in tests/python/test_ad_if.py. The C++ change replaces a single-level parent->parent_stmt()->is<RangeForStmt>() check with a walk-up loop that stops at any unconditionally-dominating block owner (loop/offloaded/mesh-for/kernel root), ensuring the adjoint alloca is visible from both the forward and reverse SSA scopes regardless of nesting depth.

Security risks

None. This is a compiler IR transform for automatic differentiation with no security-sensitive surface area.

Level of scrutiny

Moderate — this touches the core reverse-mode AD transform which is non-trivial, but the change is surgical and well-bounded. The SSA-dominance invariant is clearly explained in the new comment, the stop-set is enumerated, and the QD_ASSERT(target \!= nullptr) guard surfaces malformed IR loudly rather than silently degrading. Four new tests cover f32/f64 variants, if/elif/else nesting, and nested-for shapes; an xfail test documents the known while-loop gap.

Other factors

All three rounds of review comments (comment reference fix, missing require=qd.extension.adstack, and the two April 21 inline comments about inaccurate IR descriptions in test_ad_nested_for_loops_global_load and test_ad_nested_if_elif_else_inside_for_loop) have been addressed in the current code. The bug hunting system found no bugs. The PR is part of a larger autodiff stack but the changes are self-contained.

@duburcqa duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 46d1da3 to ce478a5 Compare April 21, 2026 14:42
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all previous review points addressed, fix is correct.

Extended reasoning...

Overview

Two-file change: (alloca placement logic in ) and (four new regression tests plus one xfail). Fixes silently-zero gradients for field reads nested inside blocks within a differentiated for-loop.

Security risks

None. This is a compiler IR transform pass; no auth, crypto, or data exposure surface.

Level of scrutiny

Medium. The change is a focused bug fix in an autodiff pass — not trivial (IR manipulation, SSA-dominance reasoning), but the logic is well-explained and the stop-set is correct. The four new tests pin all the shapes the fix covers, and the replaces a silent-degradation fallback with a loud failure for malformed IR. All three previous review rounds had their feedback fully addressed.

Other factors

All inline comments from my two prior review runs are marked resolved and confirmed fixed in the diff: correct test reference, guard, corrected IR-structure description in the elif test comment, and accurate documentation that exercises Case 2 rather than the walk-up. No outstanding reviewer comments remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants