[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion by duburcqa · Pull Request #503 · Genesis-Embodied-AI/quadrants

duburcqa · 2026-04-17T12:15:27Z

Summary

Adds UnaryOpType::rsqrt to NonLinearOps::unary_collections. AdStackAllocaJudger walks that set to decide whether a unary op's operand alloca needs AdStack backing; without membership, loop-variant allocas feeding only rsqrt stayed plain and the reversed loop read the last-forward value, producing silently wrong gradients.

rsqrt's reverse formula (-0.5 * rsqrt(x)^3) already recomputes on the operand, so correctness inside a dynamic loop hinges entirely on the operand's alloca being promoted to AdStack.

Covered by extending test_adstack_unary_loop_carried parametrize with qd.rsqrt.

Base: #502.

claude

LGTM — simple one-op addition to the nonlinear set with a matching test, following the exact pattern of existing ops.

Extended reasoning...

Overview

The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp (a single element added to a brace-init list), and extends the @pytest.mark.parametrize list in test_adstack.py with (qd.rsqrt, "rsqrt").

Security risks

None. This is a compile-time set membership change in the autodiff IR pass. No user input handling, no auth, no crypto.

Level of scrutiny

Low. The change is mechanical: rsqrt was already handled in MakeAdjoint::visit(UnaryOpStmt*) with the correct reverse formula (-0.5 * rsqrt(x)^3), and MakeDual::visit(UnaryOpStmt*) likewise. The only missing piece was its absence from unary_collections, which controls whether AdStackAllocaJudger promotes the operand alloca to an AdStack. The fix is exactly the same as how all other nonlinear unary ops (sin, cos, sqrt, tanh, exp, etc.) are registered.

Other factors

The flagged bug in the inline comment is explicitly pre-existing and affects all ops in unary_collections equally — it is orthogonal to this PR. The new test covers the common single-load case that the fix targets. No CODEOWNER concerns; the change is self-contained.

duburcqa · 2026-04-17T16:52:11Z

@claude review

claude

LGTM — surgical one-entry addition to NonLinearOps::unary_collections with a matching test case, consistent with all other ops in the set.

Extended reasoning...

Overview

The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections set in auto_diff.cpp (one line) and adds ("rsqrt", 0.05, 0.0) to the test_adstack_unary_loop_carried parametrize list (one line). No logic is changed; no new code paths are introduced.

Security risks

None. This is a purely internal compiler IR classification change with no security surface.

Level of scrutiny

Low. The change is mechanical: adding one enum value to a std::set initializer that already contains ten sibling ops (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The set drives AdStackAllocaJudger's decision to promote a loop-variant alloca to AdStack backing; omitting rsqrt from it was a straightforward omission bug that produces wrong gradients in dynamic loops.

Other factors

A pre-existing limitation in AdStackAllocaJudger (multi-load alloca pattern) was flagged in a prior review and has been acknowledged and resolved by the author as out-of-scope — it affects every op in the set equally and predates this PR. The new test exercises the common single-load case where the fix is effective, mirroring the coverage strategy used for sqrt, log, and the other domain-restricted ops. No bugs were found by the automated system.

claude

LGTM — adding rsqrt to NonLinearOps::unary_collections is a one-entry surgical fix consistent with every other nonlinear op in the set, and the pre-existing AdStackAllocaJudger single-load limitation is acknowledged as out of scope.

Extended reasoning...

Overview

The PR touches two files: a one-line addition of UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp, and a single new parametrize entry in test_adstack.py. No logic is changed; the set membership is the only delta.

Security risks

None. This is an internal compiler IR transformation with no user-facing security surface.

Level of scrutiny

Low. The change mirrors the existing pattern for every other nonlinear unary op (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes on the adstack-backed operand in MakeAdjoint::visit(UnaryOpStmt*) (already present in the file), so correctness in dynamic loops requires exactly this set membership.

Other factors

The pre-existing AdStackAllocaJudger single-load-tracking limitation was raised in a prior review and the author explicitly acknowledged it as out of scope — it equally affects all ten other ops already in the set and is not introduced by this PR. That inline comment is resolved. No bugs were found by the automated system. The test exercises the multi-iteration path (n_iter = 1, 3, 10) that would catch a stale-value regression.

claude

LGTM — surgical and correct.

Extended reasoning...

Overview

The PR adds UnaryOpType::rsqrt to NonLinearOps::unary_collections (a one-element addition to an existing std::set) and extends the parametrized test_adstack_unary_loop_carried test with qd.rsqrt. No other code is touched.

Security risks

None. This is a compiler IR transform with no auth, crypto, or permission surface.

Level of scrutiny

Low. The change is mechanically identical to the ten existing entries in the set. The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes solely on the adstack-backed operand, so promoting the operand alloca is both necessary and sufficient for correctness inside dynamic loops. The test covers the single-load path at multiple iteration counts and operand values.

Other factors

The pre-existing multi-load alloca tracking limitation in AdStackAllocaJudger that I flagged in a prior review affects all ten existing ops equally; the author acknowledged it as out of scope and resolved the inline comments. No new bugs were found by the bug hunting system. The PR is correctly scoped.

`AdStackAllocaJudger` walks `NonLinearOps::unary_collections` to decide whether the operand alloca of a unary op needs AdStack backing. rsqrt's reverse formula is `-0.5 * rsqrt(x)^3` and already recomputes on the operand, so correctness inside a dynamic loop hinges entirely on the operand's alloca being promoted to AdStack. Without membership in the set, a loop-variant alloca feeding only rsqrt stayed plain and the reversed loop read the last-forward value, producing wrong gradients. Covered by extending test_adstack_unary_loop_carried with qd.rsqrt.

claude

LGTM — surgical one-liner that correctly adds rsqrt to NonLinearOps::unary_collections, with a matching test.

Extended reasoning...

Overview

The PR touches two files: (one entry added to the unary_collections initializer list) and (one entry added to the parametrize list). No logic is changed; only membership in a compile-time set is extended.

Correctness

rsqrt's reverse formula (-0.5 * rsqrt(x)^3) recomputes entirely from the operand, not from the forward rsqrt value — identical in structure to sqrt and log, both of which are already in the set. Without membership in unary_collections, AdStackAllocaJudger would not flag the operand alloca for AdStack promotion, leaving it as a plain alloca overwritten each forward iteration; the reversed loop would then read the last-forward value for every backward step, producing silently wrong gradients. The fix is correct.

Security risks

None. This is a pure autodiff IR transformation with no auth, crypto, or I/O surface.

Level of scrutiny

Low. The change is a one-line set membership extension that exactly mirrors the pattern used for the other ten ops in the set. The prior concern about the multi-load alloca pattern in AdStackAllocaJudger was raised in my previous review; the author acknowledged it as a pre-existing limitation affecting all ops equally, resolved the thread, and kept the PR surgical. Nothing substantive has changed since that exchange, and the concern is orthogonal to this PR's correctness.

Other factors

No bugs were reported by the automated hunter. The test parametrize entry uses the same (step=0.05, offset=0.0) pair as sqrt and log, appropriate for a positive-domain op. The n_iter ∈ {1,3,10} sweep exercises the multi-iteration path that would expose any missed AdStack promotion.

…qa/split_autodiff_mark_rsqrt_nonlinear

hughperkins · 2026-04-20T12:49:46Z

Opus description update:

What's in the PR

Net diff vs origin/duburcqa/split_autodiff_tanh_exp_recompute: 2 files, +3 / -2.

`quadrants/transforms/auto_diff.cpp`

Adds UnaryOpType::rsqrt to NonLinearOps::unary_collections (with a list-reflow for readability).
No MakeAdjoint/MakeDual change — the existing rsqrt reverse formula is already operand-based.

`tests/python/test_adstack.py`

Adds ("rsqrt", 0.05, 0.0) to the loop-carried unary parametrize, in the positive-domain group with sqrt/log/asin/acos. Same (step, offset) as sqrt since the
domain (x > 0) coincides.

Good points

Smallest, cleanest patch in the chain. One-line set-membership change plus one parametrize entry. Trivial to review.
Correctly identifies that rsqrt only had problem (a) and not (b). The previous PRs in the stack established two separate failure modes for nonlinear unary ops in
dynamic loops: missing from unary_collections (operand not adstack-backed) vs. reverse formula reusing stmt (BackupSSA stale-value). This PR correctly diagnoses that
rsqrt only suffered from (a) and fixes only that, without unnecessary formula edits.
Pinned by the regression test from the baseline PR. The new parametrize entry exercises the exact loop-carried path that would have been silently broken before. If a
future refactor regresses rsqrt's set membership or operand-based formula, the test catches it at n_iter ≥ 3.
No behavior change for existing call sites outside dynamic loops — the change only activates the adstack promotion path, which scalar/single-iteration code doesn't go
through.

Bad points / concerns

Reinforces the case for a structural fix. This is now the fourth PR in a row patching unary_collections membership op-by-op (after tan, tanh, exp). The repeated
occurrences argue for deriving the set from each unary op's nonlinearity at registration time, rather than maintaining a hand-curated list in C++ that's prone to drift from
the Python op registry. The PR doesn't (and shouldn't, scope-wise) attempt that — but the chain of PRs is the strongest argument for it yet.
No assertion that the audit is complete. After this PR, unary_collections covers abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt. The differentiable
unary ops in python/quadrants/lang/ops.py also include neg (linear, intentionally omitted) and a few non-differentiable ones (round, floor, ceil, frexp, bit_not,
popcnt, logical_not). It would be worth a meta-test asserting that every diff-able unary op in the Python registry is either in unary_collections or in an explicit
KNOWN_LINEAR allow-list — otherwise the next added unary op repeats this whole exercise.
Inherits all the limitations of the baseline test. Single dtype (f32 only), no f64 variant of the new entry, hard-coded n = 4 and stride 0.05.
Merge commit, not rebase. Branch was brought up to date via git merge rather than rebased onto split_autodiff_tanh_exp_recompute. History contains a merge commit;
depending on repo conventions a clean rebase + force-push may be preferred.
Fourth PR in a stack on the same surface area. Reviewers need to merge split_adstack_unary_baseline → split_autodiff_tan_derivative →
split_autodiff_tanh_exp_recompute → this PR in order, and any force-push to an upstream PR will require re-syncing this one. Not a code issue, just a process note.

hughperkins · 2026-04-20T12:50:34Z

Pleas address:

No assertion that the audit is complete. After this PR, unary_collections covers abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt. The differentiable
unary ops in python/quadrants/lang/ops.py also include neg (linear, intentionally omitted) and a few non-differentiable ones (round, floor, ceil, frexp, bit_not,
popcnt, logical_not). It would be worth a meta-test asserting that every diff-able unary op in the Python registry is either in unary_collections or in an explicit
KNOWN_LINEAR allow-list — otherwise the next added unary op repeats this whole exercise.
Inherits all the limitations of the baseline test. Single dtype (f32 only), no f64 variant of the new entry, hard-coded n = 4 and stride 0.05.

claude

LGTM — surgical, correct fix with appropriate test coverage.

Extended reasoning...

Overview

Two files changed: one entry added to the NonLinearOps::unary_collections set in auto_diff.cpp, and one parametrize entry added to test_adstack_unary_loop_carried in test_adstack.py. The rsqrt reverse formula (-0.5 * rsqrt(x)^3) already recomputes from the operand, so correctness in dynamic loops depends entirely on the operand's alloca being promoted to AdStack — which requires rsqrt to be in unary_collections.

Security risks

None. This is a pure compiler IR transformation change with no security surface.

Level of scrutiny

Low. The change exactly mirrors how the ten existing entries (sin, cos, sqrt, etc.) are registered. The test addition follows the established parametrize pattern and uses a (step, offset) pair that keeps the operand in the valid domain. The pre-existing AdStackAllocaJudger single-load limitation I flagged in the prior review was acknowledged as out-of-scope and equally affects all existing ops — it is not a regression from this PR.

Other factors

The inline comment was resolved by the author with a clear rationale. No outstanding concerns remain.

This was referenced Apr 17, 2026

[AutoDiff] Autodiff 1: Add baseline adstack regression test for unary_collections #500

Open

[AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for #496

Open

[AutoDiff] Autodiff 6: Adstack regression tests #491

Open

duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 274e8da to 9187509 Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 0f7ff3b to 56c457e Compare April 17, 2026 12:29

duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 9187509 to 8eefde1 Compare April 17, 2026 12:31

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 56c457e to 8fd18b0 Compare April 17, 2026 12:31

claude bot reviewed Apr 17, 2026

View reviewed changes

Comment thread quadrants/transforms/auto_diff.cpp

duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 8eefde1 to 6176818 Compare April 17, 2026 12:42

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 8fd18b0 to d813066 Compare April 17, 2026 12:42

duburcqa mentioned this pull request Apr 17, 2026

[Bug]: loss.backward() hangs indefinitely for articulated robots with freejoint + child joints Genesis-Embodied-AI/Genesis#2537

Open

hughperkins changed the title ~~[AutoDiff] Mark rsqrt as non-linear for adstack promotion~~ [AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion Apr 17, 2026

duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 6176818 to 03a53ec Compare April 17, 2026 15:57

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from d813066 to ce8e91b Compare April 17, 2026 15:57

This was referenced Apr 17, 2026

[AutoDiff] Autodiff 3: Recompute tanh/exp on the operand in the reverse pass #502

Open

[AutoDiff] Autodiff 2: Implement derivative for tan #501

Open

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 03a53ec to d55b19f Compare April 17, 2026 20:30

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from ce8e91b to 7d5ce3c Compare April 17, 2026 20:30

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 7d5ce3c to 94e5650 Compare April 17, 2026 21:36

claude bot reviewed Apr 17, 2026

View reviewed changes

duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 94e5650 to 421d30e Compare April 17, 2026 22:33

claude bot reviewed Apr 17, 2026

View reviewed changes

Merge branch 'duburcqa/split_autodiff_tanh_exp_recompute' into duburc…

c27ceb3

…qa/split_autodiff_mark_rsqrt_nonlinear

claude bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion#503

[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion#503
duburcqa wants to merge 2 commits intoduburcqa/split_autodiff_tanh_exp_recomputefrom
duburcqa/split_autodiff_mark_rsqrt_nonlinear

duburcqa commented Apr 17, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

duburcqa commented Apr 17, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

hughperkins commented Apr 20, 2026

Uh oh!

hughperkins commented Apr 20, 2026

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

duburcqa commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

duburcqa commented Apr 17, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Correctness

Security risks

Level of scrutiny

Other factors

Uh oh!

hughperkins commented Apr 20, 2026

What's in the PR

quadrants/transforms/auto_diff.cpp

tests/python/test_adstack.py

Good points

Bad points / concerns

Uh oh!

hughperkins commented Apr 20, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

duburcqa commented Apr 17, 2026 •

edited

Loading

`quadrants/transforms/auto_diff.cpp`

`tests/python/test_adstack.py`