[OFT] Chebyshev-Optimized Newton-Schulz (CANS): Faster and Better by Koratahiu · Pull Request #1512 · Nerogar/OneTrainer

Koratahiu · 2026-06-07T12:24:34Z

The Issue

In OFT, we currently orthogonalize the weights using two different methods:

Default: Truncated Cayley-Neumann
Exact Solver: Exact matrix math

The exact solver is typically excluded from practical use because it is computationally slow and scales poorly. On the other hand, the Cayley-Neumann method exhibits a relatively high orthogonalization error. While the exact solver achieves an error of around $10^{-6}$, Cayley-Neumann's error ranges between $0.1$ and $0.5$. It is also unstable for matrices with higher norms (as noted in #1492) and converges poorly in those cases. Which makes it scale variant and prone to error.

Standard alternative approximations, such as standard Newton-Schulz, were evaluated but did not resolve these issues.

Our Solution (CANS)

CANS is a variant of the Newton-Schulz (NS) algorithm designed to achieve strict orthogonality.

Note: Standard NS flattens singular values but is not explicitly optimized to reach true orthogonality.

A known limitation of CANS is that it requires tuning a lower bound parameter to converge optimally. However, for the OFT formulation ($I + Q$), we can define this lower bound simply as:
1 / Frobenius norm of I+Q

Using this bound makes CANS highly suitable for OFT.

Convergence compared to current methods

The plot above shows the performance on a random matrix. In this test case, CANS (red) achieves lower orthogonalization error than both Cayley-Neumann and the exact solver.

Block-size invariant & Number of iterations - matmuls required

It only requires 7 iterations (14 matmuls) to fully converge in FP32.

Note: It has 2 matmuls per iteration, unlike standard NS, which has 3 matmuls per iteration. This makes 7 CNS iterations faster than 5 standard iterations.

Test plan

pre-commit run --all-files passes
Launched the affected UI or script and exercised the change
Tested with at least one real preset / config when relevant (note which: ____)

AI assistance

No AI involvement

Sources:

Accelerating Newton-Schulz Iteration for Orthogonalization via Chebyshev-type Polynomials

- Cast to BF16 - Decrease steps to 5

- torch.bmm() for batched 3d

- Dynamic steps based on dtype

- revert scaled oft change

…to cans_oft

Koratahiu · 2026-06-11T06:31:44Z

This is ready for testing and review.
I tested it myself, and it was the first time I have seen OFT reach the higher norms of 20+ (using CANS + spectral scaling). This pushed it to its limits, which might results in an expected +50% to +80% increase in expressiveness for the same block size.

It is also optimal for DOFT #1335, since the orthogonalization error is very small (1e-6 compared to 0.1-0.01 of Cayley-Neumann, assuming FP32).

…r and Better) into preview

Resolved conflicts with local DoRA-OFT work (oft_clipped_norm / spectral-norm clipping): kept both features side by side. - OFTRotationModule: CANS Newton-Schulz iteration added alongside power-iteration spectral clipping; oft_cans disables Cayley-Neumann. - OFTModule/TrainConfig/LoraTab: oft_cans field added next to oft_clipped_norm; CANS switch placed at row 5 (row 4 taken by DOFT). - Conv2d forward, apply_to_module and oft_verify now pass oft_cans through to _cayley_batch so merge/verify match training math.

…t max-norm Supersedes the local bool port of the same PR with its final upstream form: oft_clipped_norm is now float | None (the max spectral norm itself, default 0.95, None disables) instead of on/off at a hard-coded 0.999. The clip now applies before the orthogonalization branch (all methods, including CANS), and the clipped_oft marker buffer is persistent with the clip value embedded for inference tools. Local adjustments: - TrainConfig.from_dict coerces the legacy bool form (True -> 0.999, False -> None); float(False) would otherwise clip rotations to zero. - DoRAOFTModule signature updated to mirror OFTModule (also fixes the positional oft_cans arg added by the PR Nerogar#1512 merge, which DOFT did not yet accept). - UI entry placed at row 5 col 0/1; CANS switch stays at row 5 col 3/4.

…r and Better) into preview

Resolved conflicts with local DoRA-OFT work (oft_clipped_norm / spectral-norm clipping): kept both features side by side. - OFTRotationModule: CANS Newton-Schulz iteration added alongside power-iteration spectral clipping; oft_cans disables Cayley-Neumann. - OFTModule/TrainConfig/LoraTab: oft_cans field added next to oft_clipped_norm; CANS switch placed at row 5 (row 4 taken by DOFT). - Conv2d forward, apply_to_module and oft_verify now pass oft_cans through to _cayley_batch so merge/verify match training math.

…t max-norm Supersedes the local bool port of the same PR with its final upstream form: oft_clipped_norm is now float | None (the max spectral norm itself, default 0.95, None disables) instead of on/off at a hard-coded 0.999. The clip now applies before the orthogonalization branch (all methods, including CANS), and the clipped_oft marker buffer is persistent with the clip value embedded for inference tools. Local adjustments: - TrainConfig.from_dict coerces the legacy bool form (True -> 0.999, False -> None); float(False) would otherwise clip rotations to zero. - DoRAOFTModule signature updated to mirror OFTModule (also fixes the positional oft_cans arg added by the PR Nerogar#1512 merge, which DOFT did not yet accept). - UI entry placed at row 5 col 0/1; CANS switch stays at row 5 col 3/4.

…to cans_oft

Koratahiu added 9 commits June 7, 2026 13:29

initial

e704965

fix cans buffer

7b97c70

- Improve to L_inf norm which has tighter bound

80f98a4

- Cast to BF16 - Decrease steps to 5

- Tensor of ones_like instead of 1

d39c144

- torch.bmm() for batched 3d

Use explicit operations instead of torch.linalg.matrix_norm

1fe9091

Cache id_mat and torch.compile workaround

5c9afb9

add oft_scaled support for CANS

609a5eb

- Remove hardcoded BF16

f2cdc33

- Dynamic steps based on dtype

- Double the rotation to align

cc3fc18

- revert scaled oft change

Koratahiu mentioned this pull request Jun 11, 2026

Clipped OFT Norm for Long-term Stability #1492

Open

Koratahiu added 4 commits June 11, 2026 09:23

pre-commit

28337ce

Merge branch 'cans_oft' of https://github.com/Koratahiu/OneTrainer in…

a758e95

…to cans_oft

rename to R_half

e29d0f6

pre-commit

49f7cb6

Koratahiu marked this pull request as ready for review June 11, 2026 06:32

Koratahiu added 2 commits June 12, 2026 18:22

Add detach to the norm

6c195b9

Remove the clamp

3b66998

dxqb added the preview merged in the preview branch label Jun 13, 2026

dxqb added a commit that referenced this pull request Jun 14, 2026

Merge PR #1512 ([OFT] Chebyshev-Optimized Newton-Schulz (CANS): Faste…

4d3c0fa

…r and Better) into preview

dxqb added a commit that referenced this pull request Jun 19, 2026

Merge PR #1512 ([OFT] Chebyshev-Optimized Newton-Schulz (CANS): Faste…

5da3d3e

…r and Better) into preview

Koratahiu added 2 commits June 24, 2026 02:23

Reorder the squaring to avoid noise amplification

7d415a4

Merge branch 'cans_oft' of https://github.com/Koratahiu/OneTrainer in…

7b9acd2

…to cans_oft

Koratahiu mentioned this pull request Jun 26, 2026

[OFT] Rework and Matrix Exponential Mode #1556

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OFT] Chebyshev-Optimized Newton-Schulz (CANS): Faster and Better#1512

[OFT] Chebyshev-Optimized Newton-Schulz (CANS): Faster and Better#1512
Koratahiu wants to merge 17 commits into
Nerogar:masterfrom
Koratahiu:cans_oft

Koratahiu commented Jun 7, 2026 •

edited

Loading

Uh oh!

Koratahiu commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Koratahiu commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Issue

Our Solution (CANS)

Convergence compared to current methods

Block-size invariant & Number of iterations - matmuls required

Test plan

AI assistance

Sources:

Uh oh!

Koratahiu commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Koratahiu commented Jun 7, 2026 •

edited

Loading