Add backward pass to fused layer_norm by NahButch · Pull Request #3613 · huggingface/candle

NahButch · 2026-06-12T18:39:46Z

candle_nn::ops::layer_norm used apply_op3_no_bwd, so loss.backward() silently returned no gradient for any Var upstream of it, while layer_norm_slow is differentiable. Same naming footgun as rms_norm (#3526), softmax_last_dim (#3591), and rope (#3568).

Implements the standard layernorm backward with backend-agnostic tensor ops (computed in f32 for f16/bf16 inputs, matching the forward kernels), returning gradients for x, alpha, and beta. Adds a gradient test comparing the fused path against layer_norm_slow autograd; it fails on the previous behavior with a missing gradient.

Fixes #3011

🤖 Generated with Claude Code

candle_nn::ops::layer_norm used apply_op3_no_bwd, so loss.backward() silently returned no gradient for any Var upstream of it, while layer_norm_slow is differentiable. Same naming footgun as rms_norm (huggingface#3526), softmax_last_dim (huggingface#3591), and rope (huggingface#3568). Implements the standard layernorm backward with backend-agnostic tensor ops (computed in f32 for f16/bf16 inputs, matching the forward kernels), returning gradients for x, alpha, and beta. Adds a gradient test comparing the fused path against layer_norm_slow autograd; it fails on the previous behavior with a missing gradient. Fixes huggingface#3011 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backward pass to fused layer_norm#3613

Add backward pass to fused layer_norm#3613
NahButch wants to merge 1 commit into
huggingface:mainfrom
NahButch:layer-norm-backward

NahButch commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NahButch commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant