fix: guard save_for_backward on grad_bias not bias in fused linear CE forward (#1157)

aryanputta · web-flow · commit d8d6630e5c55 · 2026-03-23T12:57:46.000Z
## Summary Fixes a bug in `LigerFusedLinearCrossEntropyFunction.forward` where `ctx.save_for_backward` checks `bias is not None` instead of `grad_bias is not None` before calling `.detach()`. When `input_requires_grad=False`, `fused_linear_cross_entropy_forward` sets `grad_bias = None` regardless of whether `bias` is provided. The stale check then calls `None.detach()` and raises `AttributeError`. Fixes #1156 ## Details One-line fix in `src/liger_kernel/ops/fused_linear_cross_entropy.py`: Before: `grad_bias.detach() if bias is not None else None,` After: `grad_bias.detach() if grad_bias is not None else None,` ## Testing Done - Hardware Type: CPU (logic-only fix, no GPU kernel change) - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence
diff --git a/src/liger_kernel/ops/fused_linear_cross_entropy.py b/src/liger_kernel/ops/fused_linear_cross_entropy.py
@@ -361,7 +361,7 @@ def forward(
         ctx.save_for_backward(
             grad_input.detach(),
             grad_weight.detach() if grad_weight is not None else None,
-            grad_bias.detach() if bias is not None else None,
+            grad_bias.detach() if grad_bias is not None else None,
         )
         ctx.return_z_loss = return_z_loss
         ctx.return_token_accuracy = return_token_accuracy

Original file line number	Diff line number	Diff line change
`@@ -361,7 +361,7 @@ def forward(`
`361`	`361`	`ctx.save_for_backward(`
`362`	`362`	`grad_input.detach(),`
`363`	`363`	`grad_weight.detach() if grad_weight is not None else None,`
`364`		`- grad_bias.detach() if bias is not None else None,`
	`364`	`+ grad_bias.detach() if grad_bias is not None else None,`
`365`	`365`	`)`
`366`	`366`	`ctx.return_z_loss = return_z_loss`
`367`	`367`	`ctx.return_token_accuracy = return_token_accuracy`