[AutoDiff] Autodiff 6: Adstack regression tests by duburcqa · Pull Request #491 · Genesis-Embodied-AI/quadrants

duburcqa · 2026-04-16T21:14:26Z

Summary

Generic adstack regression tests that complement the unary-op coverage baseline in #500. Every fix that was originally in this PR has moved to one of the upstream split PRs:

[AutoDiff] Autodiff 1: Add baseline adstack regression test for unary_collections #500: test_adstack_unary_loop_carried (baseline, parametrized over existing unary_collections ops)
[AutoDiff] Autodiff 2: Implement derivative for tan #501: tan derivative + extends parametrize with qd.tan
[AutoDiff] Autodiff 3: Recompute tanh/exp on the operand in the reverse pass #502: tanh/exp reverse-pass recompute + extends parametrize with qd.tanh, qd.exp
[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion #503: rsqrt marked non-linear + extends parametrize with qd.rsqrt
[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for #496: adjoint-alloca placement fix

This PR contains only the remaining adstack tests (test_adstack_basic_gradient, test_adstack_sum_*).

Base: #496.

chatgpt-codex-connector

💡 Codex Review

https://github.com/Genesis-Embodied-AI/quadrants/blob/c7d23e5698683e4b69b4cdfefa9ce85d6c94bb42/transforms/auto_diff.cpp#L1261-L1266
Add tan to nonlinear-op stack detection

Now that UnaryOpType::tan is differentiated, operands feeding tan in loop-carried locals must be treated as nonlinear for ad-stack backup, but NonLinearOps::unary_collections (used by AdStackAllocaJudger::visit(UnaryOpStmt)) still omits tan. In reverse mode with adstack, this means some allocas used by tan are left as plain locals, so overwritten primal values are not preserved across iterations and gradients can be computed from the wrong (often last-iteration) value.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

duburcqa · 2026-04-16T22:28:11Z

Addressed in 5498454. The Codex review correctly identified that UnaryOpType::tan was missing from NonLinearOps::unary_collections, but fixing only the set membership does not actually fix the bug it describes: a test with a = tan(a) in a dynamic loop still returns a wrong gradient because MakeAdjoint's tan formula was adj(stmt) * (1 + sqr(stmt)), and BackupSSA spills the forward tan result (stmt) to a plain alloca, so the reversed loop reads the last-iteration value for every backward step. The same latent bug also affected tanh and exp. The fix is therefore two-part: add tan to the non-linear set so the operand's alloca is promoted to an AdStack, and rewrite tan/tanh/exp's reverse formula to recompute the op on the adstack-backed operand rather than reuse the forward stmt. Covered by the new parametrized test_adstack_unary_loop_carried, which uses torch.autograd as the reference.

duburcqa · 2026-04-17T06:13:57Z

@claude review

duburcqa · 2026-04-17T08:58:58Z

@claude review

claude

LGTM — all previously flagged issues (tan/rsqrt missing from unary_collections, tan/tanh/exp forward-value reuse, alloca placement for GlobalLoadStmt outside the processed range-for, and the false-confidence test pattern) have been addressed in this revision.

Extended reasoning...

Overview

The PR fixes four orthogonal reverse-mode AD correctness bugs in quadrants/transforms/auto_diff.cpp: adjoint-alloca placement for GlobalLoadStmt outside the currently-processed RangeFor, a nested-if alloca placement bug, implementing tan derivatives, and fixing loop-carried primal recomputation for tan/tanh/exp. Two new test files (test_adstack.py, additions to test_ad_basics.py) cover the fixed paths.

Previous review round-trips

Three rounds of inline comments from this reviewer:

Round 1 — UnaryOpType::tan missing from NonLinearOps::unary_collections and tan/tanh/exp reverse formulas reusing the forward value rather than recomputing on the adstack-backed operand. Both addressed: tan added to the set; tan/tanh/exp now call tan(stmt->operand) / tanh(stmt->operand) / exp(stmt->operand).
Round 2 — UnaryOpType::rsqrt also missing from unary_collections. Addressed: rsqrt is now present alongside sqrt in the set.
Round 3 — Test comment claimed a purely-additive update was the sole promotion trigger, but a LocalLoad→LocalStore write-back cycle is independently sufficient. Addressed: test_adstack_unary_loop_carried now assigns a fresh from x[i] + j*0.1 inside the inner loop body, so LocalStore(a) precedes any LocalLoad(a) in the traversal order and the write-back cycle check cannot fire.

Security risks

None. This is an internal IR compiler pass with no user-visible API surface, no I/O, and no authentication/permission logic.

Level of scrutiny

Medium-high: correctness-critical AD transform logic, but the diff is focused, the reasoning in code comments is sound, and all identified gaps have been closed. The torch.autograd reference comparison in test_adstack_unary_loop_carried provides an independent oracle. No design decisions remain open.

Other factors

Bug hunting system found no new issues. The existing AD regression suite is documented as remaining green. Changes are backend-agnostic.

claude

Additional findings (outside current diff — PR may have been updated during review):

🟣 quadrants/transforms/auto_diff.cpp:1869-1876 — MakeDual::visit(UnaryOpStmt) has no handler for UnaryOpType::floor or UnaryOpType::ceil — the else-branch fires QD_NOT_IMPLEMENTED. This is a pre-existing gap: MakeAdjoint already has an explicit 'do nothing' branch for floor/ceil (piecewise-constant => zero gradient everywhere), but MakeDual does not. Any kernel using qd.floor() or qd.ceil() differentiated in forward mode (qd.ad.FwdMode) will crash at runtime with an unimplemented error. Since this PR directly modifies MakeDual::visit(UnaryOpStmt) to add the tan handler, this was the natural place to add the matching floor/ceil no-op.
Extended reasoning...

What the bug is and how it manifests

MakeDual::visit(UnaryOpStmt) handles: neg, abs, sin, cos, tan (newly added by this PR), tanh, asin, acos, exp, log, sqrt, rsqrt, cast_value, logic_not — but has no branch for UnaryOpType::floor or UnaryOpType::ceil. The else-branch at the bottom of that function calls QD_NOT_IMPLEMENTED. Any kernel compiled in forward-mode AD (qd.ad.FwdMode) that contains a qd.floor() or qd.ceil() call will therefore crash at runtime with an unimplemented error.

The specific code path that triggers it

In auto_diff.cpp, MakeDual::visit(UnaryOpStmt): the chain of else-if branches checks each op type in order. After logic_not (which does nothing), the final else clause fires QD_NOT_IMPLEMENTED with the op type name. UnaryOpType::floor and UnaryOpType::ceil are not listed anywhere in MakeDual::visit(UnaryOpStmt), so both fall through to this crash.

Why existing code does not prevent it

MakeAdjoint::visit(UnaryOpStmt) was already written correctly. Its very first branch is:
if (stmt->op_type == UnaryOpType::floor || stmt->op_type == UnaryOpType::ceil) { // do nothing }
This is correct because floor and ceil are piecewise-constant: their derivative is 0 everywhere (except integer points where it is undefined). No accumulation is needed. MakeDual simply never received the corresponding no-op branch; the asymmetry predates this PR.

What the impact would be

Any kernel containing qd.floor() or qd.ceil() that is differentiated via qd.ad.FwdMode will hit QD_NOT_IMPLEMENTED and crash at runtime. The crash is a hard failure, not a silent correctness issue, but it is surprising because floor/ceil are supported ops and reverse-mode handles them without issue. The bug is exposed by the test_ad_frac test in test_ad_basics.py which uses qd.floor and qd.ceil, though that test only exercises reverse mode.

How to fix it

Add the floor/ceil no-op as the first branch in MakeDual::visit(UnaryOpStmt), mirroring MakeAdjoint:
} else if (stmt->op_type == UnaryOpType::floor || stmt->op_type == UnaryOpType::ceil) {
// do nothing: floor/ceil are piecewise-constant, dual component is zero

Step-by-step proof
1. User writes qd.floor(x[i]) inside a kernel, differentiates with qd.ad.FwdMode.
2. The forward pass runs normally since UnaryOpType::floor computes fine.
3. MakeDual::run(block) is invoked on the independent block.
4. MakeDual::visit(UnaryOpStmt) is called for the floor statement.
5. stmt->op_type == UnaryOpType::floor matches none of the existing branches (neg, abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt, cast_value, logic_not).
6. The final else branch fires: QD_NOT_IMPLEMENTED and the program crashes.
7. The same happens for ceil, which is likewise absent from MakeDual.
🟡 quadrants/transforms/auto_diff.cpp:1869-1873 — The comment at auto_diff.cpp line 1872 in MakeDual::visit(tan) says 'See the matching reverse-mode case for rationale,' but the reverse-mode rationale explains why it avoids reusing the forward stmt value (BackupSSA overwrites it each iteration), while MakeDual correctly does reuse stmt — the opposite design choice. A maintainer following the cross-reference may misread it as a reason to also recompute tan(stmt->operand) in forward mode, which would be redundant but reflects a misunderstanding of the design.
Extended reasoning...

What the bug is and how it manifests

MakeDual::visit(tan) at auto_diff.cpp line 1872 uses sqr(stmt) — it reuses the forward tan result inline with the JVP. The comment says 'See the matching reverse-mode case for rationale.' But the reverse-mode comment's entire rationale explains why MakeAdjoint avoids using stmt: 'the primal is per-iteration inside dynamic loops but BackupSSA only spills forward values to a single plain alloca, so reading the forward tan would use the last-iteration value.' A reader following the cross-reference lands on an explanation for avoiding stmt, while the actual forward-mode code uses stmt — the opposite design choice with opposite reasoning.

The specific code path that triggers it

MakeDual::visit(UnaryOpStmt) for tan (auto_diff.cpp ~line 1872): accumulate(stmt, mul(add(constant(1), sqr(stmt)), dual(stmt->operand))). sqr(stmt) reuses the forward tan value, which is correct for forward mode because MakeDual generates JVPs inline during the forward pass — stmt is always the current-iteration value with no BackupSSA concern. The comment then tells the reader to look at MakeAdjoint::tan for rationale. MakeAdjoint::tan says it recomputes tan(operand) specifically because the forward stmt is unreliable due to BackupSSA. These rationales are for opposite choices.

Why existing code does not prevent it

The cross-reference is the only documentation for why MakeDual::tan uses stmt. No test validates comment accuracy. The consistent tanh case in MakeDual (sqr(stmt), no comment) shows the correct self-contained pattern.

What the impact would be

A future maintainer reads MakeDual::tan, follows the comment to MakeAdjoint, reads that stmt is unsafe due to BackupSSA, and may change MakeDual to recompute tan(stmt->operand). The change would be functionally redundant but reflects a misunderstanding. In the worst case the maintainer might also introduce a gratuitous consistency requirement between forward and reverse that leads to incorrect future changes.

How to fix it

Replace the cross-reference with a self-contained explanation: 'In forward mode the JVP is computed inline with the forward pass so stmt always holds the current-iteration tan value; BackupSSA does not apply here, unlike the reverse-mode case where the operand must be recomputed.'

Step-by-step proof of the misleading cross-reference
1. MakeDual::tan code: accumulate(stmt, mul(add(constant(1), sqr(stmt)), dual(stmt->operand))). Uses stmt directly — correct because MakeDual runs inline with the forward pass.
2. Comment says: 'See the matching reverse-mode case for rationale.'
3. Reader navigates to MakeAdjoint::tan. Its comment says: 'Recompute tan(operand) rather than reusing the forward value: the primal is per-iteration inside dynamic loops but BackupSSA only spills forward values to a single plain alloca, so reading the forward tan would use the last-iteration value in the reversed loop.'
4. Reader infers: stmt is dangerous because BackupSSA overwrites it. MakeDual should also not reuse stmt.
5. Conclusion 4 is wrong for forward mode — BackupSSA runs after MakeAdjoint on the reverse pass, not on the forward pass that MakeDual generates. The cross-reference is pointing to a rationale that does not apply.
🟡 tests/python/test_adstack.py:106-128 — The abs parametrization in test_adstack_unary_loop_carried (line 103) uses x_val=0.5, giving a=0.5, 0.6, 0.7 for j=0,1,2 — all positive — so the correct gradient (sgn(0.5)+sgn(0.6)+sgn(0.7)=3) and the broken gradient (3*sgn(0.7)=3) are identical; the test cannot detect if abs is accidentally dropped from NonLinearOps::unary_collections. This is a pre-existing test quality weakness unrelated to the core fixes in this PR. Using x_val=-0.15 (giving a=-0.15,-0.05,0.05, correct gradient=-1, broken gradient=3) would make the assertion actually sensitive to the regression the comment claims to guard against.
Extended reasoning...

What the bug is and how it manifests

The test comment at test_adstack.py lines ~108-118 asserts: "If qd_op is dropped from [unary_collections], a_alloca stays a plain alloca ... producing a wrong gradient that torch.autograd catches." For the abs parametrization this claim is false. With x_val=0.5 and j=0,1,2, the kernel evaluates a = x[i] + j*0.1 giving a=0.5, 0.6, 0.7 — every value strictly positive. abs'(a) = sgn(a) = +1 for all three, so the summed gradient is unconditionally 3.

The specific code path that triggers it

The test kernel assigns a fresh each inner iteration (a = x[i] + j*0.1), so a_alloca is NOT loop-carried. The comment is correct that AdStackAllocaJudger::visit(LocalStoreStmt) write-back cycle check cannot fire independently, and the only promotion path is via AdStackAllocaJudger::visit(UnaryOpStmt) checking NonLinearOps::unary_collections membership. If abs were removed from that set, a_alloca would remain a plain AllocaStmt, BackupSSA would spill its last-written value (a=0.7) to a single plain alloca, and the reversed inner loop would read 0.7 for every backward step.

Why existing code does not prevent it

With the stale-value bug active: gradient = 3 * sgn(0.7) = 3 * 1 = 3. With the correct adstack promotion: gradient = sgn(0.5)+sgn(0.6)+sgn(0.7) = 1+1+1 = 3. Both evaluate to 3. The assert x.grad[0] == approx(x_t.grad.item()) comparison succeeds in both cases, so the regression goes undetected.

What the impact would be

This is a test-coverage weakness, not a production correctness bug. abs was already in unary_collections before this PR, and the production code remains correct. The risk is that a future accidental deletion of abs from unary_collections would pass this test silently, defeating the regression guard the comment advertises.

How to fix it

Replace x_val=0.5 with x_val=-0.15. This gives a=-0.15, -0.05, 0.05 for j=0,1,2. The correct gradient is sgn(-0.15)+sgn(-0.05)+sgn(0.05) = -1-1+1 = -1. The broken gradient (stale a=0.05) is 3sgn(0.05) = 31 = 3. The assertion -1 != 3 now actually catches the regression.

Step-by-step proof
1. x_val=0.5, inner loop j=0,1,2: a_0=0.5, a_1=0.6, a_2=0.7 — all positive.
2. Hypothetically drop abs from unary_collections: AdStackAllocaJudger never sets is_stack_needed_=true for a_alloca. ReplaceLocalVarWithStacks leaves it as a plain AllocaStmt.
3. BackupSSA: a_alloca is spilled to a single plain alloca. Forward pass overwrites it with a_0, then a_1, then a_2=0.7. After the full forward pass the plain alloca holds 0.7.
4. Reversed inner loop reads 0.7 three times. Gradient = 3*sgn(0.7) = 3.
5. torch.autograd also returns 3 (sgn(0.5)+sgn(0.6)+sgn(0.7)=3). Assertion: 3 == approx(3) — passes. Regression silently missed.

claude

LGTM — the test design is correct: the inner loop computes a fresh from x[i] + j*0.1 each iteration with no LocalStore(a) writeback, so AdStackAllocaJudger can only promote a_alloca via unary_collections membership, and all previous concerns are resolved.

Extended reasoning...

Overview

This PR adds four basic adstack regression tests to tests/python/test_adstack.py (basic_gradient, sum_fixed_coeff, sum_constant_coeffs, sum_static_loop_correct) and reduces the loop count in test_intrinsics.py from 200000 to 50000 for CI timing. No production code is touched — all code changes landed in upstream PRs #496–499.

Review of prior concerns

Three rounds of prior review raised issues that have all been resolved in the current state:

UnaryOpType::tan missing from NonLinearOps::unary_collections — fixed in PR #497 with the two-part fix (set membership + reverse-formula recompute).
UnaryOpType::rsqrt missing from unary_collections, with the existing test not catching it due to the multiplicative update — fixed in PR #499; test_adstack_unary_loop_carried now uses a = x[i] + j*0.1 which avoids any LocalStore(a) writeback in the inner body.
The intermediate test comment claiming a = a + 0.01 was an isolated probe — that version was replaced; the current pattern correctly makes unary_collections membership the sole promotion path.

Security risks

None — this is a test-only change with no auth, crypto, or permission-sensitive code paths.

Level of scrutiny

Low. Test-only PR closing the final step of a five-way split stack. The logic being tested (AdStack promotion, gradient formulas) was validated in the upstream PRs. The new tests are mechanically straightforward.

Other factors

No bugs found by the automated bug-hunting system. Both prior inline comments are resolved. The test_intrinsics.py iteration count reduction (200k→50k) is a safe CI speedup with no correctness implications.

claude

LGTM — test-only change, all prior substantive concerns resolved in upstream split PRs.

Extended reasoning...

Overview

This PR adds four new adstack regression tests to tests/python/test_adstack.py: test_adstack_basic_gradient, test_adstack_sum_fixed_coeff, test_adstack_sum_constant_coeffs, and test_adstack_sum_static_loop_correct. The only file touched is a test file; no production code is modified.

Security risks

None — test-only change with no security surface.

Level of scrutiny

Low. The tests exercise well-defined gradient computations with hardcoded expected values or simple analytical expressions. The logic is straightforward and the expected values are easily verified by hand (e.g., 0.95^10 for the geometric recurrence, 3.0 for three equal accumulations).

Other factors

All prior substantive issues identified in previous review rounds (missing UnaryOpType::tan and UnaryOpType::rsqrt from NonLinearOps::unary_collections, false-confidence test pattern) were addressed by the author in upstream split PRs (#500–#503), and all inline comments are resolved. The sole remaining bug report is a style nit: the four new decorators include ad_stack_experimental_enabled=True explicitly, but this kwarg is a no-op since test_utils.test() already sets it unconditionally when require=qd.extension.adstack is present. No functional impact; pre-existing tests omit the kwarg, so consistency is slightly off, but it does not affect correctness or test coverage.

hughperkins · 2026-04-17T13:30:19Z



+@test_utils.test(require=qd.extension.adstack, ad_stack_experimental_enabled=True)
+def test_adstack_basic_gradient():


would be good to add a docstring to each test saying what is special about the test, what kind of issue it might detect.

hughperkins · 2026-04-17T13:32:25Z

Opus description:

Summary

Adds four new adstack regression tests that exercise reverse-mode AD through dynamic
(and static) loops with simple linear operations — no unary math ops involved. These
complement the existing test_adstack_unary_loop_carried tests by covering the basic
loop-carried accumulation and scaling patterns that the adstack must handle correctly.

Tests added

test_adstack_basic_gradient: A loop-carried decay (v = v * 0.95 + 0.01) over
10 iterations. Checks that x.grad equals 0.95^10, verifying the chain rule through
repeated linear updates.
test_adstack_sum_fixed_coeff: Sums v = x[i] three times inside a dynamic loop
(y += v). Checks x.grad == 3.0 — the simplest possible loop-carried gradient.
test_adstack_sum_constant_coeffs: Sums v * (a+1) for a in range(3), where
the coefficient varies per iteration but v is loop-invariant. Checks
x.grad == 1 + 2 + 3 = 6.0.
test_adstack_sum_static_loop_correct: Same as test_adstack_sum_fixed_coeff but
uses qd.static(range(3)) instead of a dynamic range, verifying that static loop
unrolling doesn't break the gradient.

Strengths

Covers the linear/affine case. The existing tests all hinge on non-linear unary
ops; these four tests verify that the adstack works correctly even when the loop body
is purely linear, which is arguably the more common real-world pattern.
Easy to reason about. Every expected gradient is a simple closed-form expression
(0.95^10, 3.0, 6.0), making failures immediately interpretable.
Dynamic vs. static coverage. Testing both range(3) and qd.static(range(3))
guards against regressions where one path works and the other doesn't.
Progressive complexity. The four tests form a natural ladder: fixed read → fixed
read with scaling → loop-varying coefficient → loop-carried state mutation.

Weaknesses / things to consider

No docstrings. None of the four tests have comments explaining what they're
testing or why. The existing tests in this file have detailed comments; these would
benefit from at least a one-line description each.
Single-element fields throughout. All four tests use dense(qd.i, 1). The
existing test_adstack_unary_loop_carried was upgraded to use 4 elements — these
could do the same for consistency and to catch per-element accumulation bugs.
ad_stack_experimental_enabled=True but no require=qd.extension.adstack…
Actually, all four tests do have require=qd.extension.adstack — disregard. However,
unlike the unary tests, none of these use @pytest.mark.needs_torch or
cross-validate against PyTorch, which is fine since the expected values are trivially
hand-computed.
Fixed loop counts only. Unlike the unary tests which parametrize over
n_iter=[1, 3, 10], these tests use a single hardcoded loop count each. Parametrizing
would increase coverage cheaply.
test_adstack_basic_gradient is the only test with loop-carried state mutation
(v = v * 0.95 + 0.01). The other three read v without modifying it in the loop,
so they don't truly exercise adstack replay of evolving state — they could pass even
without adstack promotion.

hughperkins · 2026-04-17T13:35:36Z

Please address Opus commets:

No docstrings. None of the four tests have comments explaining what they're
testing or why. The existing tests in this file have detailed comments; these would
benefit from at least a one-line description each.
Single-element fields throughout. All four tests use dense(qd.i, 1). The
existing test_adstack_unary_loop_carried was upgraded to use 4 elements — these
could do the same for consistency and to catch per-element accumulation bugs.
ad_stack_experimental_enabled=True but no require=qd.extension.adstack…
Actually, all four tests do have require=qd.extension.adstack — disregard. However,
unlike the unary tests, none of these use @pytest.mark.needs_torch or
cross-validate against PyTorch, which is fine since the expected values are trivially
hand-computed.
Fixed loop counts only. Unlike the unary tests which parametrize over
n_iter=[1, 3, 10], these tests use a single hardcoded loop count each. Parametrizing
would increase coverage cheaply.
test_adstack_basic_gradient is the only test with loop-carried state mutation
(v = v * 0.95 + 0.01). The other three read v without modifying it in the loop,
so they don't truly exercise adstack replay of evolving state — they could pass even
without adstack promotion.

In addition, can we add a 'negative test' in each unit test please:

disable adstack
- check gradients are zero

claude

LGTM — test code is correct; the one remaining nit (docstring overclaims value-replay necessity for the linear kernel) is captured in the inline comment.

Extended reasoning...

Overview

This PR adds adstack regression tests to : a shared helper , positive/negative parametrized tests for the loop-carried decay pattern, and a parametrized covering static vs dynamic loops and constant vs varying coefficients.

Security risks

None — test-only file, no production code changes.

Level of scrutiny

Low. The PR touches only one test file and introduces no new logic beyond pytest parametrization and field setup. All test assertions are simple closed-form comparisons.

Other factors

All prior review concerns have been resolved: the upstream split PRs (#500–#503) handle / additions to , uses the redefinition pattern to prevent false promotion, redundant kwargs were removed, docstrings for and were corrected, and the import was switched to the public . The remaining nit — that the internal-details comment overclaims value-replay necessity for a linear kernel — is documentation-only with no functional consequence and is already captured as an inline comment for the author.

Covers the adstack pipeline end-to-end: basic loop-carried gradient, additive and multiplicative sums, and static-loop unrolling. The non-linear unary-op coverage lives upstream in `test_adstack_unary_loop_carried`; these tests exercise the complementary paths (primal-gradient/affine/fixed coeffs, static-loop short-circuit) that are not directly driven by `NonLinearOps::unary_collections`.

…ulti-element, factorize linear).

…static vs dynamic range.

…rror when adstack is disabled.

…compile-time rejection semantics.

…needed but the linear formula does not read v from the stack.

…urally required (not for v-replay), matching the test function's Internal details block.

claude

LGTM — all prior review feedback addressed.

Extended reasoning...

Overview

This PR adds two parametrized adstack regression tests to tests/python/test_adstack.py: test_adstack_basic_gradient (plus its negative counterpart) and test_adstack_sum_linear. No production code is modified.

Security risks

None. Test-only change with no effect on runtime behaviour or public API surface.

Level of scrutiny

Low. The change is confined to a single test file. The logic is straightforward Python/pytest — parametrized kernel definitions, forward+backward calls, and assertion checks against closed-form expected values.

Other factors

All inline comments from prior review rounds are resolved. Specific issues addressed: redundant ad_stack_experimental_enabled=True kwargs removed, docstrings corrected to accurately reflect what each test pins (compile-time rejection vs. gradient value), _run_basic_gradient helper comment aligned with the test docstring, and n=4 multi-element coverage added. The bug hunting system found no new issues.

…fix_ad_correctness

hughperkins · 2026-04-20T13:01:04Z

Opus description update:

Summary

Adds three regression tests pinning baseline reverse-mode AD behavior through dynamic for-loops with the adstack extension. Test-only PR — no production code changes.
The tests cover (a) that AD through a dynamic for-loop works at all and produces the analytical gradient, (b) that disabling the adstack causes the backward compiler to reject
the same kernel shape at compile time rather than silently producing wrong gradients, and (c) that enabling the adstack extension does not silently regress linear
reverse-mode AD across the static/dynamic × constant/varying-coefficient matrix.

What's in the PR

Net diff vs origin/duburcqa/split_adjoint_alloca_placement: 1 file, +121 / -0.

`_run_basic_gradient(n_iter, shall_not_pass)`

Shared helper. Builds the kernel for i: v = x[i]; for _ in range(n_iter): v = v*0.95 + 0.01; y += v, runs forward + backward, and either asserts the analytical gradient
0.95 ** n_iter (positive case) or asserts that compute.grad() raises qd.QuadrantsCompilationError matching non static range (negative case).

`test_adstack_basic_gradient(n_iter ∈ {1, 3, 10})`

Smallest "does reverse-mode AD through a for-loop work at all" check. Multi-element (n = 4) to catch per-element accumulation bugs. Docstring is explicit about scope: the
adstack is structurally required to reverse the dynamic range, but value-correctness of the spilled v is not exercised because the loop body is linear — the backward
chain adj(v_prev) = 0.95 * adj(v_next) only uses the compile-time constant 0.95 and never reads v. Cross-references test_adstack_unary_loop_carried for value-correctness
coverage.

`test_adstack_basic_gradient_negative(n_iter ∈ {1, 3, 10})`

Negative counterpart, gated with @test_utils.test(ad_stack_experimental_enabled=False). Pins that disabling the adstack causes the backward compiler to reject the same
kernel with QuadrantsCompilationError("Cannot use non static range in Backwards mode").

`test_adstack_sum_linear(use_static_loop, use_varying_coeff, n_iter)`

3 × 2 × 2 = 12 cases covering all combinations of (static-unrolled vs dynamic inner loop) × (constant vs loop-index-varying coefficient), at three trip counts. Replaces three
earlier separate tests with one parametrized matrix. Uses the qd.static(...) if qd.static(...) else ... ternary idiom to switch loop kind. Docstring explains why no negative
counterpart is included.

Good points

Pure regression coverage, no risk to production code. The diff is one file of new tests; merge risk is essentially zero.
Positive + negative pair. test_adstack_basic_gradient and test_adstack_basic_gradient_negative lock in both the success path and the deterministic compile-time
rejection on the same kernel shape. The negative test is what guards against the worst regression direction — adstack accidentally disabled producing silently-wrong gradients
instead of a loud compile error.
Honest scope documentation. The test_adstack_basic_gradient docstring explicitly says what the test does not exercise (value-correctness of the spilled v) and why
(the linear body's backward chain doesn't read v). It then points the reader at the test that does cover that gap. This is the kind of "what this test isn't" framing
that's normally lost to PR comments and review threads; baking it into the source means a future contributor can't trivially over-rely on this test as a stronger oracle than
it actually is.
test_adstack_sum_linear is a parametrize matrix that subsumes three prior tests. Net code is smaller than three hand-rolled variants would be, and every cell of the
truth table is exercised at every trip count. Replaces a documentation problem (which test covers which combination?) with a parametrize problem (which is self-documenting).
n_iter ∈ {1, 3, 10} consistently across all three tests. 1 is the single-push edge case; 3/10 exercise repeated push/pop. Matches the convention from the
baseline PR.
Multi-element fields (n = 4). Catches per-element accumulation bugs that single-element tests would silently miss.
Match-string on the negative test (r"non static range") locks in the specific error condition, not just any QuadrantsCompilationError. Reduces the chance of a
different compile error masquerading as the "expected" rejection.
Iterative review history is visible in the commits. Each follow-up commit (parametrize, multi-element, ternary idiom, error class refinement, docstring corrections)
addresses a specific review point — easy to audit what changed and why.

Bad points / concerns

test_adstack_sum_linear is the weakest of the three. Its docstring acknowledges this — "does not stress the adstack" — and declines to add a negative counterpart,
which is defensible. But the value of "prove enabling the adstack extension does not silently regress linear AD" depends on linear AD not having its own per-loop-shape
coverage elsewhere; if it does, this test is partly redundant. Worth a one-line cross-reference if such tests exist.
No f64 variant. Same f32-only limitation as the rest of the chain. The linear-coefficient gradient 0.95 ** n_iter is small enough at n_iter = 10 (≈ 0.599) that f64
wouldn't materially tighten the tolerance, but a smoke f64 parametrize entry would still rule out a dtype-specific regression.
Hardcoded n = 4 and x_vals = [0.1, 0.3, 0.5, 0.8] in both _run_basic_gradient and test_adstack_sum_linear — drifts if one is updated and the other isn't. A small
shared constant would make the coupling explicit.
tol = 1e-4 (via test_utils.approx(rel=1e-4)) matches the baseline PR's tolerance, but for a strictly linear kernel the gradient is exact in f32 up to round-off; a
tighter tolerance (e.g. rel=1e-6) would be a stronger oracle and might catch a future bug that introduces extra fp ops in the backward.
test_adstack_basic_gradient_negative only asserts the compile error fires. It doesn't assert that no gradient gets computed (e.g. that x.grad[i] is unchanged from
its pre-call zero). Probably fine because pytest.raises already implies the call didn't return, but a defensive post-assertion would harden the test against a future change
where compute.grad() partially executes before raising.
No coverage of nested dynamic loops. All three tests use a single inner dynamic loop. The adstack's behavior under nested dynamic ranges is a meaningfully different code
path. Probably out of scope here, but worth a follow-up.
Inherits the merge-commit + multi-PR-stack process notes from earlier PRs in the chain.

hughperkins · 2026-04-20T13:02:40Z

Please address:

No f64 variant. Same f32-only limitation as the rest of the chain. The linear-coefficient gradient 0.95 ** n_iter is small enough at n_iter = 10 (≈ 0.599) that f64
wouldn't materially tighten the tolerance, but a smoke f64 parametrize entry would still rule out a dtype-specific regression.
tol = 1e-4 (via test_utils.approx(rel=1e-4)) matches the baseline PR's tolerance, but for a strictly linear kernel the gradient is exact in f32 up to round-off; a
tighter tolerance (e.g. rel=1e-6) would be a stronger oracle and might catch a future bug that introduces extra fp ops in the backward.

claude

LGTM — all prior review threads resolved, docstrings accurate, tests well-structured.

Extended reasoning...

Overview

This PR adds regression tests for the adstack feature in tests/python/test_adstack.py: _run_basic_gradient helper, test_adstack_basic_gradient (positive), test_adstack_basic_gradient_negative, and test_adstack_sum_linear (parametrized over static/dynamic loops and constant/varying coefficients). No production code is touched.

Security risks

None — test-only additions with no auth, crypto, or permission-sensitive code.

Level of scrutiny

Low. The change is entirely test additions with clearly-scoped analytical expected values. The most complex part is the parametrized test_adstack_sum_linear which covers 12 combinations (2×2×3), but the logic and expected values are straightforward.

Other factors

All issues raised across multiple review rounds have been addressed: missing unary ops fixed in upstream PRs, redundant decorator kwargs cleaned up, inaccurate docstrings rewritten, internal import replaced with public API, and the contradictory helper/test comment reconciled. Bug hunting found no remaining issues. Every inline thread is marked resolved.

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from c7d23e5 to 755a995 Compare April 16, 2026 21:18

chatgpt-codex-connector bot reviewed Apr 16, 2026

View reviewed changes

claude bot reviewed Apr 16, 2026

View reviewed changes

Comment thread quadrants/transforms/auto_diff.cpp

duburcqa mentioned this pull request Apr 16, 2026

[AutoDiff] Autodiff 8: implement adstack for SPIR-V #490

Open

duburcqa changed the title ~~Fix adjoint-alloca placement and implement tan derivative~~ Fix reverse-mode AD correctness for adstack and nested control flow Apr 16, 2026

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread quadrants/transforms/auto_diff.cpp Outdated

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/python/test_adstack.py Outdated

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 84a4438 to e281a6e Compare April 17, 2026 08:41

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa mentioned this pull request Apr 17, 2026

[AutoDiff] Autodiff 7: Surface LLVM adstack push/pop overflow as a Python exception #495

Open

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from e281a6e to 196977b Compare April 17, 2026 11:37

duburcqa changed the base branch from main to duburcqa/split_autodiff_mark_rsqrt_nonlinear April 17, 2026 11:37

This was referenced Apr 17, 2026

[AutoDiff] Recompute tanh/exp on the operand in the reverse pass #498

Merged

[AutoDiff] Mark rsqrt as non-linear for adstack promotion #499

Merged

duburcqa changed the title ~~Fix reverse-mode AD correctness for adstack and nested control flow~~ [AutoDiff] Adstack regression tests Apr 17, 2026

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from d0c43e2 to c060858 Compare April 17, 2026 11:43

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 196977b to c73cb3d Compare April 17, 2026 11:44

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from c060858 to 0f7ff3b Compare April 17, 2026 12:12

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from c73cb3d to 3d0ecaf Compare April 17, 2026 12:12

Base automatically changed from duburcqa/split_autodiff_mark_rsqrt_nonlinear to duburcqa/split_adjoint_alloca_placement April 17, 2026 12:12

duburcqa mentioned this pull request Apr 17, 2026

[AutoDiff] Autodiff 1: Add baseline adstack regression test for unary_collections #500

Open

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 3d0ecaf to 7b52cc8 Compare April 17, 2026 12:18

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from b9d107a to 0c487e2 Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 7b52cc8 to 2a30384 Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 0c487e2 to 7f9c702 Compare April 17, 2026 12:31

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 2a30384 to b390eb0 Compare April 17, 2026 12:31

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/python/test_adstack.py Outdated

hughperkins changed the title ~~[AutoDiff] Adstack regression tests~~ [AutoDiff] Autodiff 6: Adstack regression tests Apr 17, 2026

hughperkins reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from a5f664f to d9d2d09 Compare April 17, 2026 15:57

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 4984a2b to 1cb9ad7 Compare April 17, 2026 15:57

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/python/test_adstack.py Outdated

Comment thread tests/python/test_adstack.py Outdated

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from d9d2d09 to da7fb7a Compare April 17, 2026 20:31

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 1cb9ad7 to 91db488 Compare April 17, 2026 20:31

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/python/test_adstack.py

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from da7fb7a to 1bb6c33 Compare April 17, 2026 21:36

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 91db488 to 32d3599 Compare April 17, 2026 21:36

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/python/test_adstack.py Outdated

duburcqa added 8 commits April 18, 2026 00:29

[AutoDiff] Rework adstack regression tests per review (parametrize, m…

ddea21d

…ulti-element, factorize linear).

[AutoDiff] Parametrize test_adstack_sum_linear on n_iter per review.

4dd2485

[AutoDiff] Fix test_adstack_sum_linear: use inline ternary idiom for …

7f10b16

…static vs dynamic range.

[AutoDiff] Negative basic_gradient test: expect QuadrantsCompilationE…

2ca1b3a

…rror when adstack is disabled.

[AutoDiff] Use qd.QuadrantsCompilationError; realign docstrings with …

dae2a9f

…compile-time rejection semantics.

[AutoDiff] Clarify basic_gradient docstring: adstack is structurally …

aa3d97e

…needed but the linear formula does not read v from the stack.

[AutoDiff] Fix _run_basic_gradient docstring to say adstack is struct…

c6cca68

…urally required (not for v-replay), matching the test function's Internal details block.

duburcqa force-pushed the duburcqa/split_adjoint_alloca_placement branch from 1bb6c33 to 2bfdfc5 Compare April 17, 2026 22:34

duburcqa force-pushed the duburcqa/fix_ad_correctness branch from 32d3599 to c6cca68 Compare April 17, 2026 22:34

claude bot reviewed Apr 17, 2026

View reviewed changes

Merge branch 'duburcqa/split_adjoint_alloca_placement' into duburcqa/…

8a6f423

…fix_ad_correctness

claude bot reviewed Apr 20, 2026

View reviewed changes



		@test_utils.test(require=qd.extension.adstack, ad_stack_experimental_enabled=True)
		def test_adstack_basic_gradient():

Conversation

duburcqa commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

duburcqa commented Apr 16, 2026

Uh oh!

duburcqa commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

duburcqa commented Apr 17, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Previous review round-trips

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Review of prior concerns

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

hughperkins Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

hughperkins commented Apr 17, 2026

Summary

Tests added

Strengths

Weaknesses / things to consider

Uh oh!

hughperkins commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

hughperkins commented Apr 20, 2026

Summary

What's in the PR

_run_basic_gradient(n_iter, shall_not_pass)

test_adstack_basic_gradient(n_iter ∈ {1, 3, 10})

test_adstack_basic_gradient_negative(n_iter ∈ {1, 3, 10})

duburcqa commented Apr 16, 2026 •

edited

Loading

`_run_basic_gradient(n_iter, shall_not_pass)`

`test_adstack_basic_gradient(n_iter ∈ {1, 3, 10})`

`test_adstack_basic_gradient_negative(n_iter ∈ {1, 3, 10})`

`test_adstack_sum_linear(use_static_loop, use_varying_coeff, n_iter)`