Skip to content

Add backward pass to fused layer_norm#3613

Open
NahButch wants to merge 1 commit into
huggingface:mainfrom
NahButch:layer-norm-backward
Open

Add backward pass to fused layer_norm#3613
NahButch wants to merge 1 commit into
huggingface:mainfrom
NahButch:layer-norm-backward

Conversation

@NahButch

Copy link
Copy Markdown

candle_nn::ops::layer_norm used apply_op3_no_bwd, so loss.backward() silently returned no gradient for any Var upstream of it, while layer_norm_slow is differentiable. Same naming footgun as rms_norm (#3526), softmax_last_dim (#3591), and rope (#3568).

Implements the standard layernorm backward with backend-agnostic tensor ops (computed in f32 for f16/bf16 inputs, matching the forward kernels), returning gradients for x, alpha, and beta. Adds a gradient test comparing the fused path against layer_norm_slow autograd; it fails on the previous behavior with a missing gradient.

Fixes #3011

🤖 Generated with Claude Code

candle_nn::ops::layer_norm used apply_op3_no_bwd, so loss.backward()
silently returned no gradient for any Var upstream of it, while
layer_norm_slow is differentiable. Same naming footgun as rms_norm
(huggingface#3526), softmax_last_dim (huggingface#3591), and rope (huggingface#3568).

Implements the standard layernorm backward with backend-agnostic
tensor ops (computed in f32 for f16/bf16 inputs, matching the forward
kernels), returning gradients for x, alpha, and beta. Adds a gradient
test comparing the fused path against layer_norm_slow autograd; it
fails on the previous behavior with a missing gradient.

Fixes huggingface#3011

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LayerNorm Gradient Flow Issue in candle-nn

1 participant