Skip to content

[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion#503

Open
duburcqa wants to merge 2 commits intoduburcqa/split_autodiff_tanh_exp_recomputefrom
duburcqa/split_autodiff_mark_rsqrt_nonlinear
Open

[AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion#503
duburcqa wants to merge 2 commits intoduburcqa/split_autodiff_tanh_exp_recomputefrom
duburcqa/split_autodiff_mark_rsqrt_nonlinear

Conversation

@duburcqa
Copy link
Copy Markdown
Contributor

@duburcqa duburcqa commented Apr 17, 2026

Summary

Adds UnaryOpType::rsqrt to NonLinearOps::unary_collections. AdStackAllocaJudger walks that set to decide whether a unary op's operand alloca needs AdStack backing; without membership, loop-variant allocas feeding only rsqrt stayed plain and the reversed loop read the last-forward value, producing silently wrong gradients.

rsqrt's reverse formula (-0.5 * rsqrt(x)^3) already recomputes on the operand, so correctness inside a dynamic loop hinges entirely on the operand's alloca being promoted to AdStack.

Covered by extending test_adstack_unary_loop_carried parametrize with qd.rsqrt.

Base: #502.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — simple one-op addition to the nonlinear set with a matching test, following the exact pattern of existing ops.

Extended reasoning...

Overview

The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp (a single element added to a brace-init list), and extends the @pytest.mark.parametrize list in test_adstack.py with (qd.rsqrt, "rsqrt").

Security risks

None. This is a compile-time set membership change in the autodiff IR pass. No user input handling, no auth, no crypto.

Level of scrutiny

Low. The change is mechanical: rsqrt was already handled in MakeAdjoint::visit(UnaryOpStmt*) with the correct reverse formula (-0.5 * rsqrt(x)^3), and MakeDual::visit(UnaryOpStmt*) likewise. The only missing piece was its absence from unary_collections, which controls whether AdStackAllocaJudger promotes the operand alloca to an AdStack. The fix is exactly the same as how all other nonlinear unary ops (sin, cos, sqrt, tanh, exp, etc.) are registered.

Other factors

The flagged bug in the inline comment is explicitly pre-existing and affects all ops in unary_collections equally — it is orthogonal to this PR. The new test covers the common single-load case that the fix targets. No CODEOWNER concerns; the change is self-contained.

Comment thread quadrants/transforms/auto_diff.cpp
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 8eefde1 to 6176818 Compare April 17, 2026 12:42
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 8fd18b0 to d813066 Compare April 17, 2026 12:42
@hughperkins hughperkins changed the title [AutoDiff] Mark rsqrt as non-linear for adstack promotion [AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion Apr 17, 2026
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 6176818 to 03a53ec Compare April 17, 2026 15:57
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from d813066 to ce8e91b Compare April 17, 2026 15:57
@duburcqa
Copy link
Copy Markdown
Contributor Author

@claude review

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — surgical one-entry addition to NonLinearOps::unary_collections with a matching test case, consistent with all other ops in the set.

Extended reasoning...

Overview

The PR touches two files: adds UnaryOpType::rsqrt to the NonLinearOps::unary_collections set in auto_diff.cpp (one line) and adds ("rsqrt", 0.05, 0.0) to the test_adstack_unary_loop_carried parametrize list (one line). No logic is changed; no new code paths are introduced.

Security risks

None. This is a purely internal compiler IR classification change with no security surface.

Level of scrutiny

Low. The change is mechanical: adding one enum value to a std::set initializer that already contains ten sibling ops (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The set drives AdStackAllocaJudger's decision to promote a loop-variant alloca to AdStack backing; omitting rsqrt from it was a straightforward omission bug that produces wrong gradients in dynamic loops.

Other factors

A pre-existing limitation in AdStackAllocaJudger (multi-load alloca pattern) was flagged in a prior review and has been acknowledged and resolved by the author as out-of-scope — it affects every op in the set equally and predates this PR. The new test exercises the common single-load case where the fix is effective, mirroring the coverage strategy used for sqrt, log, and the other domain-restricted ops. No bugs were found by the automated system.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_tanh_exp_recompute branch from 03a53ec to d55b19f Compare April 17, 2026 20:30
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from ce8e91b to 7d5ce3c Compare April 17, 2026 20:30
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — adding rsqrt to NonLinearOps::unary_collections is a one-entry surgical fix consistent with every other nonlinear op in the set, and the pre-existing AdStackAllocaJudger single-load limitation is acknowledged as out of scope.

Extended reasoning...

Overview

The PR touches two files: a one-line addition of UnaryOpType::rsqrt to the NonLinearOps::unary_collections static set in auto_diff.cpp, and a single new parametrize entry in test_adstack.py. No logic is changed; the set membership is the only delta.

Security risks

None. This is an internal compiler IR transformation with no user-facing security surface.

Level of scrutiny

Low. The change mirrors the existing pattern for every other nonlinear unary op (abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt). The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes on the adstack-backed operand in MakeAdjoint::visit(UnaryOpStmt*) (already present in the file), so correctness in dynamic loops requires exactly this set membership.

Other factors

The pre-existing AdStackAllocaJudger single-load-tracking limitation was raised in a prior review and the author explicitly acknowledged it as out of scope — it equally affects all ten other ops already in the set and is not introduced by this PR. That inline comment is resolved. No bugs were found by the automated system. The test exercises the multi-iteration path (n_iter = 1, 3, 10) that would catch a stale-value regression.

@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 7d5ce3c to 94e5650 Compare April 17, 2026 21:36
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — surgical and correct.

Extended reasoning...

Overview

The PR adds UnaryOpType::rsqrt to NonLinearOps::unary_collections (a one-element addition to an existing std::set) and extends the parametrized test_adstack_unary_loop_carried test with qd.rsqrt. No other code is touched.

Security risks

None. This is a compiler IR transform with no auth, crypto, or permission surface.

Level of scrutiny

Low. The change is mechanically identical to the ten existing entries in the set. The reverse formula for rsqrt (-0.5 * rsqrt(x)^3) recomputes solely on the adstack-backed operand, so promoting the operand alloca is both necessary and sufficient for correctness inside dynamic loops. The test covers the single-load path at multiple iteration counts and operand values.

Other factors

The pre-existing multi-load alloca tracking limitation in AdStackAllocaJudger that I flagged in a prior review affects all ten existing ops equally; the author acknowledged it as out of scope and resolved the inline comments. No new bugs were found by the bug hunting system. The PR is correctly scoped.

`AdStackAllocaJudger` walks `NonLinearOps::unary_collections` to decide whether the
operand alloca of a unary op needs AdStack backing. rsqrt's reverse formula is
`-0.5 * rsqrt(x)^3` and already recomputes on the operand, so correctness inside a
dynamic loop hinges entirely on the operand's alloca being promoted to AdStack.
Without membership in the set, a loop-variant alloca feeding only rsqrt stayed plain
and the reversed loop read the last-forward value, producing wrong gradients.

Covered by extending test_adstack_unary_loop_carried with qd.rsqrt.
@duburcqa duburcqa force-pushed the duburcqa/split_autodiff_mark_rsqrt_nonlinear branch from 94e5650 to 421d30e Compare April 17, 2026 22:33
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — surgical one-liner that correctly adds rsqrt to NonLinearOps::unary_collections, with a matching test.

Extended reasoning...

Overview

The PR touches two files: (one entry added to the unary_collections initializer list) and (one entry added to the parametrize list). No logic is changed; only membership in a compile-time set is extended.

Correctness

rsqrt's reverse formula (-0.5 * rsqrt(x)^3) recomputes entirely from the operand, not from the forward rsqrt value — identical in structure to sqrt and log, both of which are already in the set. Without membership in unary_collections, AdStackAllocaJudger would not flag the operand alloca for AdStack promotion, leaving it as a plain alloca overwritten each forward iteration; the reversed loop would then read the last-forward value for every backward step, producing silently wrong gradients. The fix is correct.

Security risks

None. This is a pure autodiff IR transformation with no auth, crypto, or I/O surface.

Level of scrutiny

Low. The change is a one-line set membership extension that exactly mirrors the pattern used for the other ten ops in the set. The prior concern about the multi-load alloca pattern in AdStackAllocaJudger was raised in my previous review; the author acknowledged it as a pre-existing limitation affecting all ops equally, resolved the thread, and kept the PR surgical. Nothing substantive has changed since that exchange, and the concern is orthogonal to this PR's correctness.

Other factors

No bugs were reported by the automated hunter. The test parametrize entry uses the same (step=0.05, offset=0.0) pair as sqrt and log, appropriate for a positive-domain op. The n_iter ∈ {1,3,10} sweep exercises the multi-iteration path that would expose any missed AdStack promotion.

@hughperkins
Copy link
Copy Markdown
Collaborator

Opus description update:

What's in the PR

Net diff vs origin/duburcqa/split_autodiff_tanh_exp_recompute: 2 files, +3 / -2.

quadrants/transforms/auto_diff.cpp

  • Adds UnaryOpType::rsqrt to NonLinearOps::unary_collections (with a list-reflow for readability).
  • No MakeAdjoint/MakeDual change — the existing rsqrt reverse formula is already operand-based.

tests/python/test_adstack.py

  • Adds ("rsqrt", 0.05, 0.0) to the loop-carried unary parametrize, in the positive-domain group with sqrt/log/asin/acos. Same (step, offset) as sqrt since the
    domain (x > 0) coincides.

Good points

  • Smallest, cleanest patch in the chain. One-line set-membership change plus one parametrize entry. Trivial to review.
  • Correctly identifies that rsqrt only had problem (a) and not (b). The previous PRs in the stack established two separate failure modes for nonlinear unary ops in
    dynamic loops: missing from unary_collections (operand not adstack-backed) vs. reverse formula reusing stmt (BackupSSA stale-value). This PR correctly diagnoses that
    rsqrt only suffered from (a) and fixes only that, without unnecessary formula edits.
  • Pinned by the regression test from the baseline PR. The new parametrize entry exercises the exact loop-carried path that would have been silently broken before. If a
    future refactor regresses rsqrt's set membership or operand-based formula, the test catches it at n_iter ≥ 3.
  • No behavior change for existing call sites outside dynamic loops — the change only activates the adstack promotion path, which scalar/single-iteration code doesn't go
    through.

Bad points / concerns

  • Reinforces the case for a structural fix. This is now the fourth PR in a row patching unary_collections membership op-by-op (after tan, tanh, exp). The repeated
    occurrences argue for deriving the set from each unary op's nonlinearity at registration time, rather than maintaining a hand-curated list in C++ that's prone to drift from
    the Python op registry. The PR doesn't (and shouldn't, scope-wise) attempt that — but the chain of PRs is the strongest argument for it yet.
  • No assertion that the audit is complete. After this PR, unary_collections covers abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt. The differentiable
    unary ops in python/quadrants/lang/ops.py also include neg (linear, intentionally omitted) and a few non-differentiable ones (round, floor, ceil, frexp, bit_not,
    popcnt, logical_not). It would be worth a meta-test asserting that every diff-able unary op in the Python registry is either in unary_collections or in an explicit
    KNOWN_LINEAR allow-list — otherwise the next added unary op repeats this whole exercise.
  • Inherits all the limitations of the baseline test. Single dtype (f32 only), no f64 variant of the new entry, hard-coded n = 4 and stride 0.05.
  • Merge commit, not rebase. Branch was brought up to date via git merge rather than rebased onto split_autodiff_tanh_exp_recompute. History contains a merge commit;
    depending on repo conventions a clean rebase + force-push may be preferred.
  • Fourth PR in a stack on the same surface area. Reviewers need to merge split_adstack_unary_baselinesplit_autodiff_tan_derivative
    split_autodiff_tanh_exp_recompute → this PR in order, and any force-push to an upstream PR will require re-syncing this one. Not a code issue, just a process note.

@hughperkins
Copy link
Copy Markdown
Collaborator

Pleas address:

  • No assertion that the audit is complete. After this PR, unary_collections covers abs, sin, cos, tan, tanh, asin, acos, exp, log, sqrt, rsqrt. The differentiable
    unary ops in python/quadrants/lang/ops.py also include neg (linear, intentionally omitted) and a few non-differentiable ones (round, floor, ceil, frexp, bit_not,
    popcnt, logical_not). It would be worth a meta-test asserting that every diff-able unary op in the Python registry is either in unary_collections or in an explicit
    KNOWN_LINEAR allow-list — otherwise the next added unary op repeats this whole exercise.
  • Inherits all the limitations of the baseline test. Single dtype (f32 only), no f64 variant of the new entry, hard-coded n = 4 and stride 0.05.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — surgical, correct fix with appropriate test coverage.

Extended reasoning...

Overview

Two files changed: one entry added to the NonLinearOps::unary_collections set in auto_diff.cpp, and one parametrize entry added to test_adstack_unary_loop_carried in test_adstack.py. The rsqrt reverse formula (-0.5 * rsqrt(x)^3) already recomputes from the operand, so correctness in dynamic loops depends entirely on the operand's alloca being promoted to AdStack — which requires rsqrt to be in unary_collections.

Security risks

None. This is a pure compiler IR transformation change with no security surface.

Level of scrutiny

Low. The change exactly mirrors how the ten existing entries (sin, cos, sqrt, etc.) are registered. The test addition follows the established parametrize pattern and uses a (step, offset) pair that keeps the operand in the valid domain. The pre-existing AdStackAllocaJudger single-load limitation I flagged in the prior review was acknowledged as out-of-scope and equally affects all existing ops — it is not a regression from this PR.

Other factors

The inline comment was resolved by the author with a clear rationale. No outstanding concerns remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants