[AutoDiff] Autodiff 6: Adstack regression tests#491
[AutoDiff] Autodiff 6: Adstack regression tests#491duburcqa wants to merge 9 commits intoduburcqa/split_adjoint_alloca_placementfrom
Conversation
c7d23e5 to
755a995
Compare
There was a problem hiding this comment.
💡 Codex Review
https://github.com/Genesis-Embodied-AI/quadrants/blob/c7d23e5698683e4b69b4cdfefa9ce85d6c94bb42/transforms/auto_diff.cpp#L1261-L1266
Add tan to nonlinear-op stack detection
Now that UnaryOpType::tan is differentiated, operands feeding tan in loop-carried locals must be treated as nonlinear for ad-stack backup, but NonLinearOps::unary_collections (used by AdStackAllocaJudger::visit(UnaryOpStmt)) still omits tan. In reverse mode with adstack, this means some allocas used by tan are left as plain locals, so overwritten primal values are not preserved across iterations and gradients can be computed from the wrong (often last-iteration) value.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed in 5498454. The Codex review correctly identified that UnaryOpType::tan was missing from NonLinearOps::unary_collections, but fixing only the set membership does not actually fix the bug it describes: a test with |
|
@claude review |
84a4438 to
e281a6e
Compare
|
@claude review |
There was a problem hiding this comment.
LGTM — all previously flagged issues (tan/rsqrt missing from unary_collections, tan/tanh/exp forward-value reuse, alloca placement for GlobalLoadStmt outside the processed range-for, and the false-confidence test pattern) have been addressed in this revision.
Extended reasoning...
Overview
The PR fixes four orthogonal reverse-mode AD correctness bugs in quadrants/transforms/auto_diff.cpp: adjoint-alloca placement for GlobalLoadStmt outside the currently-processed RangeFor, a nested-if alloca placement bug, implementing tan derivatives, and fixing loop-carried primal recomputation for tan/tanh/exp. Two new test files (test_adstack.py, additions to test_ad_basics.py) cover the fixed paths.
Previous review round-trips
Three rounds of inline comments from this reviewer:
- Round 1 —
UnaryOpType::tanmissing fromNonLinearOps::unary_collectionsand tan/tanh/exp reverse formulas reusing the forward value rather than recomputing on the adstack-backed operand. Both addressed:tanadded to the set; tan/tanh/exp now calltan(stmt->operand)/tanh(stmt->operand)/exp(stmt->operand). - Round 2 —
UnaryOpType::rsqrtalso missing fromunary_collections. Addressed:rsqrtis now present alongsidesqrtin the set. - Round 3 — Test comment claimed a purely-additive update was the sole promotion trigger, but a LocalLoad→LocalStore write-back cycle is independently sufficient. Addressed:
test_adstack_unary_loop_carriednow assignsafresh fromx[i] + j*0.1inside the inner loop body, soLocalStore(a)precedes anyLocalLoad(a)in the traversal order and the write-back cycle check cannot fire.
Security risks
None. This is an internal IR compiler pass with no user-visible API surface, no I/O, and no authentication/permission logic.
Level of scrutiny
Medium-high: correctness-critical AD transform logic, but the diff is focused, the reasoning in code comments is sound, and all identified gaps have been closed. The torch.autograd reference comparison in test_adstack_unary_loop_carried provides an independent oracle. No design decisions remain open.
Other factors
Bug hunting system found no new issues. The existing AD regression suite is documented as remaining green. Changes are backend-agnostic.
e281a6e to
196977b
Compare
d0c43e2 to
c060858
Compare
196977b to
c73cb3d
Compare
c060858 to
0f7ff3b
Compare
c73cb3d to
3d0ecaf
Compare
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟣
quadrants/transforms/auto_diff.cpp:1869-1876— MakeDual::visit(UnaryOpStmt) has no handler for UnaryOpType::floor or UnaryOpType::ceil — the else-branch fires QD_NOT_IMPLEMENTED. This is a pre-existing gap: MakeAdjoint already has an explicit 'do nothing' branch for floor/ceil (piecewise-constant => zero gradient everywhere), but MakeDual does not. Any kernel using qd.floor() or qd.ceil() differentiated in forward mode (qd.ad.FwdMode) will crash at runtime with an unimplemented error. Since this PR directly modifies MakeDual::visit(UnaryOpStmt) to add the tan handler, this was the natural place to add the matching floor/ceil no-op.Extended reasoning...
What the bug is and how it manifests
MakeDual::visit(UnaryOpStmt) handles: neg, abs, sin, cos, tan (newly added by this PR), tanh, asin, acos, exp, log, sqrt, rsqrt, cast_value, logic_not — but has no branch for UnaryOpType::floor or UnaryOpType::ceil. The else-branch at the bottom of that function calls QD_NOT_IMPLEMENTED. Any kernel compiled in forward-mode AD (qd.ad.FwdMode) that contains a qd.floor() or qd.ceil() call will therefore crash at runtime with an unimplemented error.
The specific code path that triggers it
In auto_diff.cpp, MakeDual::visit(UnaryOpStmt): the chain of else-if branches checks each op type in order. After logic_not (which does nothing), the final else clause fires QD_NOT_IMPLEMENTED with the op type name. UnaryOpType::floor and UnaryOpType::ceil are not listed anywhere in MakeDual::visit(UnaryOpStmt), so both fall through to this crash.
Why existing code does not prevent it
MakeAdjoint::visit(UnaryOpStmt) was already written correctly. Its very first branch is:
if (stmt->op_type == UnaryOpType::floor || stmt->op_type == UnaryOpType::ceil) { // do nothing }
This is correct because floor and ceil are piecewise-constant: their derivative is 0 everywhere (except integer points where it is undefined). No accumulation is needed. MakeDual simply never received the corresponding no-op branch; the asymmetry predates this PR.What the impact would be
Any kernel containing qd.floor() or qd.ceil() that is differentiated via qd.ad.FwdMode will hit QD_NOT_IMPLEMENTED and crash at runtime. The crash is a hard failure, not a silent correctness issue, but it is surprising because floor/ceil are supported ops and reverse-mode handles them without issue. The bug is exposed by the test_ad_frac test in test_ad_basics.py which uses qd.floor and qd.ceil, though that test only exercises reverse mode.
How to fix it
Add the floor/ceil no-op as the first branch in MakeDual::visit(UnaryOpStmt), mirroring MakeAdjoint:
} else if (stmt->op_type == UnaryOpType::floor || stmt->op_type == UnaryOpType::ceil) {
// do nothing: floor/ceil are piecewise-constant, dual component is zeroStep-by-step proof
- User writes qd.floor(x[i]) inside a kernel, differentiates with qd.ad.FwdMode.
- The forward pass runs normally since UnaryOpType::floor computes fine.
- MakeDual::run(block) is invoked on the independent block.
- MakeDual::visit(UnaryOpStmt) is called for the floor statement.
- stmt->op_type == UnaryOpType::floor matches none of the existing branches (neg, abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt, cast_value, logic_not).
- The final else branch fires: QD_NOT_IMPLEMENTED and the program crashes.
- The same happens for ceil, which is likewise absent from MakeDual.
-
🟡
quadrants/transforms/auto_diff.cpp:1869-1873— The comment at auto_diff.cpp line 1872 in MakeDual::visit(tan) says 'See the matching reverse-mode case for rationale,' but the reverse-mode rationale explains why it avoids reusing the forward stmt value (BackupSSA overwrites it each iteration), while MakeDual correctly does reuse stmt — the opposite design choice. A maintainer following the cross-reference may misread it as a reason to also recompute tan(stmt->operand) in forward mode, which would be redundant but reflects a misunderstanding of the design.Extended reasoning...
What the bug is and how it manifests
MakeDual::visit(tan) at auto_diff.cpp line 1872 uses sqr(stmt) — it reuses the forward tan result inline with the JVP. The comment says 'See the matching reverse-mode case for rationale.' But the reverse-mode comment's entire rationale explains why MakeAdjoint avoids using stmt: 'the primal is per-iteration inside dynamic loops but BackupSSA only spills forward values to a single plain alloca, so reading the forward tan would use the last-iteration value.' A reader following the cross-reference lands on an explanation for avoiding stmt, while the actual forward-mode code uses stmt — the opposite design choice with opposite reasoning.
The specific code path that triggers it
MakeDual::visit(UnaryOpStmt) for tan (auto_diff.cpp ~line 1872): accumulate(stmt, mul(add(constant(1), sqr(stmt)), dual(stmt->operand))). sqr(stmt) reuses the forward tan value, which is correct for forward mode because MakeDual generates JVPs inline during the forward pass — stmt is always the current-iteration value with no BackupSSA concern. The comment then tells the reader to look at MakeAdjoint::tan for rationale. MakeAdjoint::tan says it recomputes tan(operand) specifically because the forward stmt is unreliable due to BackupSSA. These rationales are for opposite choices.
Why existing code does not prevent it
The cross-reference is the only documentation for why MakeDual::tan uses stmt. No test validates comment accuracy. The consistent tanh case in MakeDual (sqr(stmt), no comment) shows the correct self-contained pattern.
What the impact would be
A future maintainer reads MakeDual::tan, follows the comment to MakeAdjoint, reads that stmt is unsafe due to BackupSSA, and may change MakeDual to recompute tan(stmt->operand). The change would be functionally redundant but reflects a misunderstanding. In the worst case the maintainer might also introduce a gratuitous consistency requirement between forward and reverse that leads to incorrect future changes.
How to fix it
Replace the cross-reference with a self-contained explanation: 'In forward mode the JVP is computed inline with the forward pass so stmt always holds the current-iteration tan value; BackupSSA does not apply here, unlike the reverse-mode case where the operand must be recomputed.'
Step-by-step proof of the misleading cross-reference
- MakeDual::tan code: accumulate(stmt, mul(add(constant(1), sqr(stmt)), dual(stmt->operand))). Uses stmt directly — correct because MakeDual runs inline with the forward pass.
- Comment says: 'See the matching reverse-mode case for rationale.'
- Reader navigates to MakeAdjoint::tan. Its comment says: 'Recompute tan(operand) rather than reusing the forward value: the primal is per-iteration inside dynamic loops but BackupSSA only spills forward values to a single plain alloca, so reading the forward tan would use the last-iteration value in the reversed loop.'
- Reader infers: stmt is dangerous because BackupSSA overwrites it. MakeDual should also not reuse stmt.
- Conclusion 4 is wrong for forward mode — BackupSSA runs after MakeAdjoint on the reverse pass, not on the forward pass that MakeDual generates. The cross-reference is pointing to a rationale that does not apply.
-
🟡
tests/python/test_adstack.py:106-128— The abs parametrization in test_adstack_unary_loop_carried (line 103) uses x_val=0.5, giving a=0.5, 0.6, 0.7 for j=0,1,2 — all positive — so the correct gradient (sgn(0.5)+sgn(0.6)+sgn(0.7)=3) and the broken gradient (3*sgn(0.7)=3) are identical; the test cannot detect if abs is accidentally dropped from NonLinearOps::unary_collections. This is a pre-existing test quality weakness unrelated to the core fixes in this PR. Using x_val=-0.15 (giving a=-0.15,-0.05,0.05, correct gradient=-1, broken gradient=3) would make the assertion actually sensitive to the regression the comment claims to guard against.Extended reasoning...
What the bug is and how it manifests
The test comment at test_adstack.py lines ~108-118 asserts: "If qd_op is dropped from [unary_collections], a_alloca stays a plain alloca ... producing a wrong gradient that torch.autograd catches." For the abs parametrization this claim is false. With x_val=0.5 and j=0,1,2, the kernel evaluates a = x[i] + j*0.1 giving a=0.5, 0.6, 0.7 — every value strictly positive. abs'(a) = sgn(a) = +1 for all three, so the summed gradient is unconditionally 3.
The specific code path that triggers it
The test kernel assigns a fresh each inner iteration (a = x[i] + j*0.1), so a_alloca is NOT loop-carried. The comment is correct that AdStackAllocaJudger::visit(LocalStoreStmt) write-back cycle check cannot fire independently, and the only promotion path is via AdStackAllocaJudger::visit(UnaryOpStmt) checking NonLinearOps::unary_collections membership. If abs were removed from that set, a_alloca would remain a plain AllocaStmt, BackupSSA would spill its last-written value (a=0.7) to a single plain alloca, and the reversed inner loop would read 0.7 for every backward step.
Why existing code does not prevent it
With the stale-value bug active: gradient = 3 * sgn(0.7) = 3 * 1 = 3. With the correct adstack promotion: gradient = sgn(0.5)+sgn(0.6)+sgn(0.7) = 1+1+1 = 3. Both evaluate to 3. The assert x.grad[0] == approx(x_t.grad.item()) comparison succeeds in both cases, so the regression goes undetected.
What the impact would be
This is a test-coverage weakness, not a production correctness bug. abs was already in unary_collections before this PR, and the production code remains correct. The risk is that a future accidental deletion of abs from unary_collections would pass this test silently, defeating the regression guard the comment advertises.
How to fix it
Replace x_val=0.5 with x_val=-0.15. This gives a=-0.15, -0.05, 0.05 for j=0,1,2. The correct gradient is sgn(-0.15)+sgn(-0.05)+sgn(0.05) = -1-1+1 = -1. The broken gradient (stale a=0.05) is 3sgn(0.05) = 31 = 3. The assertion -1 != 3 now actually catches the regression.
Step-by-step proof
- x_val=0.5, inner loop j=0,1,2: a_0=0.5, a_1=0.6, a_2=0.7 — all positive.
- Hypothetically drop abs from unary_collections: AdStackAllocaJudger never sets is_stack_needed_=true for a_alloca. ReplaceLocalVarWithStacks leaves it as a plain AllocaStmt.
- BackupSSA: a_alloca is spilled to a single plain alloca. Forward pass overwrites it with a_0, then a_1, then a_2=0.7. After the full forward pass the plain alloca holds 0.7.
- Reversed inner loop reads 0.7 three times. Gradient = 3*sgn(0.7) = 3.
- torch.autograd also returns 3 (sgn(0.5)+sgn(0.6)+sgn(0.7)=3). Assertion: 3 == approx(3) — passes. Regression silently missed.
3d0ecaf to
7b52cc8
Compare
b9d107a to
0c487e2
Compare
7b52cc8 to
2a30384
Compare
0c487e2 to
7f9c702
Compare
2a30384 to
b390eb0
Compare
There was a problem hiding this comment.
LGTM — the test design is correct: the inner loop computes a fresh from x[i] + j*0.1 each iteration with no LocalStore(a) writeback, so AdStackAllocaJudger can only promote a_alloca via unary_collections membership, and all previous concerns are resolved.
Extended reasoning...
Overview
This PR adds four basic adstack regression tests to tests/python/test_adstack.py (basic_gradient, sum_fixed_coeff, sum_constant_coeffs, sum_static_loop_correct) and reduces the loop count in test_intrinsics.py from 200000 to 50000 for CI timing. No production code is touched — all code changes landed in upstream PRs #496–499.
Review of prior concerns
Three rounds of prior review raised issues that have all been resolved in the current state:
UnaryOpType::tanmissing fromNonLinearOps::unary_collections— fixed in PR #497 with the two-part fix (set membership + reverse-formula recompute).UnaryOpType::rsqrtmissing fromunary_collections, with the existing test not catching it due to the multiplicative update — fixed in PR #499;test_adstack_unary_loop_carriednow usesa = x[i] + j*0.1which avoids anyLocalStore(a)writeback in the inner body.- The intermediate test comment claiming
a = a + 0.01was an isolated probe — that version was replaced; the current pattern correctly makesunary_collectionsmembership the sole promotion path.
Security risks
None — this is a test-only change with no auth, crypto, or permission-sensitive code paths.
Level of scrutiny
Low. Test-only PR closing the final step of a five-way split stack. The logic being tested (AdStack promotion, gradient formulas) was validated in the upstream PRs. The new tests are mechanically straightforward.
Other factors
No bugs found by the automated bug-hunting system. Both prior inline comments are resolved. The test_intrinsics.py iteration count reduction (200k→50k) is a safe CI speedup with no correctness implications.
There was a problem hiding this comment.
LGTM — test-only change, all prior substantive concerns resolved in upstream split PRs.
Extended reasoning...
Overview
This PR adds four new adstack regression tests to tests/python/test_adstack.py: test_adstack_basic_gradient, test_adstack_sum_fixed_coeff, test_adstack_sum_constant_coeffs, and test_adstack_sum_static_loop_correct. The only file touched is a test file; no production code is modified.
Security risks
None — test-only change with no security surface.
Level of scrutiny
Low. The tests exercise well-defined gradient computations with hardcoded expected values or simple analytical expressions. The logic is straightforward and the expected values are easily verified by hand (e.g., 0.95^10 for the geometric recurrence, 3.0 for three equal accumulations).
Other factors
All prior substantive issues identified in previous review rounds (missing UnaryOpType::tan and UnaryOpType::rsqrt from NonLinearOps::unary_collections, false-confidence test pattern) were addressed by the author in upstream split PRs (#500–#503), and all inline comments are resolved. The sole remaining bug report is a style nit: the four new decorators include ad_stack_experimental_enabled=True explicitly, but this kwarg is a no-op since test_utils.test() already sets it unconditionally when require=qd.extension.adstack is present. No functional impact; pre-existing tests omit the kwarg, so consistency is slightly off, but it does not affect correctness or test coverage.
|
|
||
|
|
||
| @test_utils.test(require=qd.extension.adstack, ad_stack_experimental_enabled=True) | ||
| def test_adstack_basic_gradient(): |
There was a problem hiding this comment.
would be good to add a docstring to each test saying what is special about the test, what kind of issue it might detect.
|
Opus description: SummaryAdds four new adstack regression tests that exercise reverse-mode AD through dynamic Tests added
Strengths
Weaknesses / things to consider
|
|
Please address Opus commets:
In addition, can we add a 'negative test' in each unit test please:
|
a5f664f to
d9d2d09
Compare
4984a2b to
1cb9ad7
Compare
d9d2d09 to
da7fb7a
Compare
1cb9ad7 to
91db488
Compare
There was a problem hiding this comment.
LGTM — test code is correct; the one remaining nit (docstring overclaims value-replay necessity for the linear kernel) is captured in the inline comment.
Extended reasoning...
Overview
This PR adds adstack regression tests to : a shared helper , positive/negative parametrized tests for the loop-carried decay pattern, and a parametrized covering static vs dynamic loops and constant vs varying coefficients.
Security risks
None — test-only file, no production code changes.
Level of scrutiny
Low. The PR touches only one test file and introduces no new logic beyond pytest parametrization and field setup. All test assertions are simple closed-form comparisons.
Other factors
All prior review concerns have been resolved: the upstream split PRs (#500–#503) handle / additions to , uses the redefinition pattern to prevent false promotion, redundant kwargs were removed, docstrings for and were corrected, and the import was switched to the public . The remaining nit — that the internal-details comment overclaims value-replay necessity for a linear kernel — is documentation-only with no functional consequence and is already captured as an inline comment for the author.
da7fb7a to
1bb6c33
Compare
91db488 to
32d3599
Compare
Covers the adstack pipeline end-to-end: basic loop-carried gradient, additive and multiplicative sums, and static-loop unrolling. The non-linear unary-op coverage lives upstream in `test_adstack_unary_loop_carried`; these tests exercise the complementary paths (primal-gradient/affine/fixed coeffs, static-loop short-circuit) that are not directly driven by `NonLinearOps::unary_collections`.
…ulti-element, factorize linear).
…static vs dynamic range.
…rror when adstack is disabled.
…compile-time rejection semantics.
…needed but the linear formula does not read v from the stack.
…urally required (not for v-replay), matching the test function's Internal details block.
1bb6c33 to
2bfdfc5
Compare
32d3599 to
c6cca68
Compare
There was a problem hiding this comment.
LGTM — all prior review feedback addressed.
Extended reasoning...
Overview
This PR adds two parametrized adstack regression tests to tests/python/test_adstack.py: test_adstack_basic_gradient (plus its negative counterpart) and test_adstack_sum_linear. No production code is modified.
Security risks
None. Test-only change with no effect on runtime behaviour or public API surface.
Level of scrutiny
Low. The change is confined to a single test file. The logic is straightforward Python/pytest — parametrized kernel definitions, forward+backward calls, and assertion checks against closed-form expected values.
Other factors
All inline comments from prior review rounds are resolved. Specific issues addressed: redundant ad_stack_experimental_enabled=True kwargs removed, docstrings corrected to accurately reflect what each test pins (compile-time rejection vs. gradient value), _run_basic_gradient helper comment aligned with the test docstring, and n=4 multi-element coverage added. The bug hunting system found no new issues.
…fix_ad_correctness
|
Opus description update: SummaryAdds three regression tests pinning baseline reverse-mode AD behavior through dynamic for-loops with the adstack extension. Test-only PR — no production code changes. What's in the PRNet diff vs
|
|
Please address:
|
There was a problem hiding this comment.
LGTM — all prior review threads resolved, docstrings accurate, tests well-structured.
Extended reasoning...
Overview
This PR adds regression tests for the adstack feature in tests/python/test_adstack.py: _run_basic_gradient helper, test_adstack_basic_gradient (positive), test_adstack_basic_gradient_negative, and test_adstack_sum_linear (parametrized over static/dynamic loops and constant/varying coefficients). No production code is touched.
Security risks
None — test-only additions with no auth, crypto, or permission-sensitive code.
Level of scrutiny
Low. The change is entirely test additions with clearly-scoped analytical expected values. The most complex part is the parametrized test_adstack_sum_linear which covers 12 combinations (2×2×3), but the logic and expected values are straightforward.
Other factors
All issues raised across multiple review rounds have been addressed: missing unary ops fixed in upstream PRs, redundant decorator kwargs cleaned up, inaccurate docstrings rewritten, internal import replaced with public API, and the contradictory helper/test comment reconciled. Bug hunting found no remaining issues. Every inline thread is marked resolved.
Summary
Generic adstack regression tests that complement the unary-op coverage baseline in #500. Every fix that was originally in this PR has moved to one of the upstream split PRs:
test_adstack_unary_loop_carried(baseline, parametrized over existingunary_collectionsops)qd.tanqd.tanh, qd.expqd.rsqrtThis PR contains only the remaining adstack tests (
test_adstack_basic_gradient,test_adstack_sum_*).Base: #496.