[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion#503
Conversation
274e8da to
9187509
Compare
0f7ff3b to
56c457e
Compare
9187509 to
8eefde1
Compare
56c457e to
8fd18b0
Compare
There was a problem hiding this comment.
LGTM — simple one-op addition to the nonlinear set with a matching test, following the exact pattern of existing ops.
Extended reasoning...
Overview
The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp (a single element added to a brace-init list), and extends the @pytest.mark.parametrize list in test_adstack.py with (qd.rsqrt, "rsqrt").
Security risks
None. This is a compile-time set membership change in the autodiff IR pass. No user input handling, no auth, no crypto.
Level of scrutiny
Low. The change is mechanical: rsqrt was already handled in MakeAdjoint::visit(UnaryOpStmt*) with the correct reverse formula (-0.5 * rsqrt(x)^3), and MakeDual::visit(UnaryOpStmt*) likewise. The only missing piece was its absence from unary_collections, which controls whether AdStackAllocaJudger promotes the operand alloca to an AdStack. The fix is exactly the same as how all other nonlinear unary ops (sin, cos, sqrt, tanh, exp, etc.) are registered.
Other factors
The flagged bug in the inline comment is explicitly pre-existing and affects all ops in unary_collections equally — it is orthogonal to this PR. The new test covers the common single-load case that the fix targets. No CODEOWNER concerns; the change is self-contained.
8eefde1 to
6176818
Compare
8fd18b0 to
d813066
Compare
6176818 to
03a53ec
Compare
d813066 to
ce8e91b
Compare
|
@claude review |
There was a problem hiding this comment.
LGTM — surgical one-entry addition to NonLinearOps::unary_collections with a matching test case, consistent with all other ops in the set.
Extended reasoning...
Overview
The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections set in auto_diff.cpp (one line) and adds ("rsqrt", 0.05, 0.0) to the test_adstack_unary_loop_carried parametrize list (one line). No logic is changed; no new code paths are introduced.
Security risks
None. This is a purely internal compiler IR classification change with no security surface.
Level of scrutiny
Low. The change is mechanical: adding one enum value to a std::set initializer that already contains ten sibling ops (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The set drives AdStackAllocaJudger's decision to promote a loop-variant alloca to AdStack backing; omitting rsqrt from it was a straightforward omission bug that produces wrong gradients in dynamic loops.
Other factors
A pre-existing limitation in AdStackAllocaJudger (multi-load alloca pattern) was flagged in a prior review and has been acknowledged and resolved by the author as out-of-scope — it affects every op in the set equally and predates this PR. The new test exercises the common single-load case where the fix is effective, mirroring the coverage strategy used for sqrt, log, and the other domain-restricted ops. No bugs were found by the automated system.
03a53ec to
d55b19f
Compare
ce8e91b to
7d5ce3c
Compare
There was a problem hiding this comment.
LGTM — adding rsqrt to NonLinearOps::unary_collections is a one-entry surgical fix consistent with every other nonlinear op in the set, and the pre-existing AdStackAllocaJudger single-load limitation is acknowledged as out of scope.
Extended reasoning...
Overview
The PR touches two files: a one-line addition of UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp, and a single new parametrize entry in test_adstack.py. No logic is changed; the set membership is the only delta.
Security risks
None. This is an internal compiler IR transformation with no user-facing security surface.
Level of scrutiny
Low. The change mirrors the existing pattern for every other nonlinear unary op (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes on the adstack-backed operand in MakeAdjoint::visit(UnaryOpStmt*) (already present in the file), so correctness in dynamic loops requires exactly this set membership.
Other factors
The pre-existing AdStackAllocaJudger single-load-tracking limitation was raised in a prior review and the author explicitly acknowledged it as out of scope — it equally affects all ten other ops already in the set and is not introduced by this PR. That inline comment is resolved. No bugs were found by the automated system. The test exercises the multi-iteration path (n_iter = 1, 3, 10) that would catch a stale-value regression.
7d5ce3c to
94e5650
Compare
There was a problem hiding this comment.
LGTM — surgical and correct.
Extended reasoning...
Overview
The PR adds UnaryOpType::rsqrt to NonLinearOps::unary_collections (a one-element addition to an existing std::set) and extends the parametrized test_adstack_unary_loop_carried test with qd.rsqrt. No other code is touched.
Security risks
None. This is a compiler IR transform with no auth, crypto, or permission surface.
Level of scrutiny
Low. The change is mechanically identical to the ten existing entries in the set. The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes solely on the adstack-backed operand, so promoting the operand alloca is both necessary and sufficient for correctness inside dynamic loops. The test covers the single-load path at multiple iteration counts and operand values.
Other factors
The pre-existing multi-load alloca tracking limitation in AdStackAllocaJudger that I flagged in a prior review affects all ten existing ops equally; the author acknowledged it as out of scope and resolved the inline comments. No new bugs were found by the bug hunting system. The PR is correctly scoped.
`AdStackAllocaJudger` walks `NonLinearOps::unary_collections` to decide whether the operand alloca of a unary op needs AdStack backing. rsqrt's reverse formula is `-0.5 * rsqrt(x)^3` and already recomputes on the operand, so correctness inside a dynamic loop hinges entirely on the operand's alloca being promoted to AdStack. Without membership in the set, a loop-variant alloca feeding only rsqrt stayed plain and the reversed loop read the last-forward value, producing wrong gradients. Covered by extending test_adstack_unary_loop_carried with qd.rsqrt.
94e5650 to
421d30e
Compare
There was a problem hiding this comment.
LGTM — surgical one-liner that correctly adds rsqrt to NonLinearOps::unary_collections, with a matching test.
Extended reasoning...
Overview
The PR touches two files: (one entry added to the unary_collections initializer list) and (one entry added to the parametrize list). No logic is changed; only membership in a compile-time set is extended.
Correctness
rsqrt's reverse formula (-0.5 * rsqrt(x)^3) recomputes entirely from the operand, not from the forward rsqrt value — identical in structure to sqrt and log, both of which are already in the set. Without membership in unary_collections, AdStackAllocaJudger would not flag the operand alloca for AdStack promotion, leaving it as a plain alloca overwritten each forward iteration; the reversed loop would then read the last-forward value for every backward step, producing silently wrong gradients. The fix is correct.
Security risks
None. This is a pure autodiff IR transformation with no auth, crypto, or I/O surface.
Level of scrutiny
Low. The change is a one-line set membership extension that exactly mirrors the pattern used for the other ten ops in the set. The prior concern about the multi-load alloca pattern in AdStackAllocaJudger was raised in my previous review; the author acknowledged it as a pre-existing limitation affecting all ops equally, resolved the thread, and kept the PR surgical. Nothing substantive has changed since that exchange, and the concern is orthogonal to this PR's correctness.
Other factors
No bugs were reported by the automated hunter. The test parametrize entry uses the same (step=0.05, offset=0.0) pair as sqrt and log, appropriate for a positive-domain op. The n_iter ∈ {1,3,10} sweep exercises the multi-iteration path that would expose any missed AdStack promotion.
…qa/split_autodiff_mark_rsqrt_nonlinear
|
Opus description update: What's in the PRNet diff vs
|
|
Pleas address:
|
There was a problem hiding this comment.
LGTM — surgical, correct fix with appropriate test coverage.
Extended reasoning...
Overview
Two files changed: one entry added to the NonLinearOps::unary_collections set in auto_diff.cpp, and one parametrize entry added to test_adstack_unary_loop_carried in test_adstack.py. The rsqrt reverse formula (-0.5 * rsqrt(x)^3) already recomputes from the operand, so correctness in dynamic loops depends entirely on the operand's alloca being promoted to AdStack — which requires rsqrt to be in unary_collections.
Security risks
None. This is a pure compiler IR transformation change with no security surface.
Level of scrutiny
Low. The change exactly mirrors how the ten existing entries (sin, cos, sqrt, etc.) are registered. The test addition follows the established parametrize pattern and uses a (step, offset) pair that keeps the operand in the valid domain. The pre-existing AdStackAllocaJudger single-load limitation I flagged in the prior review was acknowledged as out-of-scope and equally affects all existing ops — it is not a regression from this PR.
Other factors
The inline comment was resolved by the author with a clear rationale. No outstanding concerns remain.
Summary
Adds
UnaryOpType::rsqrttoNonLinearOps::unary_collections.AdStackAllocaJudgerwalks that set to decide whether a unary op's operand alloca needs AdStack backing; without membership, loop-variant allocas feeding onlyrsqrtstayed plain and the reversed loop read the last-forward value, producing silently wrong gradients.rsqrt's reverse formula (-0.5 * rsqrt(x)^3) already recomputes on the operand, so correctness inside a dynamic loop hinges entirely on the operand's alloca being promoted to AdStack.Covered by extending
test_adstack_unary_loop_carriedparametrize withqd.rsqrt.Base: #502.