Skip to content

Default to only applying QK norm over the head dimension#1367

Open
christianazinn wants to merge 1 commit intoEleutherAI:mainfrom
christianazinn:christian/qk_layernorm_fix
Open

Default to only applying QK norm over the head dimension#1367
christianazinn wants to merge 1 commit intoEleutherAI:mainfrom
christianazinn:christian/qk_layernorm_fix

Conversation

@christianazinn
Copy link
Copy Markdown

Before this change, the default application of QK norm was to apply it to dimensions [*, N, H]. This caused problems with GQA, in which the K tensor is of shape [T, B, N_kv, H], which doesn't match. This PR fixes that by changing QK norm to normalize over [*, H] by default, as in e.g. Gemma3 and the original QK norm paper, with the option to go back to using [*, N, H] available as qk_layernorm_over_heads.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 6, 2026

CLA assistant check
All committers have signed the CLA.

@StellaAthena
Copy link
Copy Markdown
Member

@christianazinn why is this a better default than the current defualt?

@christianazinn
Copy link
Copy Markdown
Author

@christianazinn why is this a better default than the current defualt?

I had assumed that most models used this normalization style, but it seems that's not the case (e.g. Olmo2). Perhaps it ought to be changed to a toggleable option that is enforced for GQA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants