Default to only applying QK norm over the head dimension by christianazinn · Pull Request #1367 · EleutherAI/gpt-neox

christianazinn · 2026-01-06T18:34:03Z

Before this change, the default application of QK norm was to apply it to dimensions [*, N, H]. This caused problems with GQA, in which the K tensor is of shape [T, B, N_kv, H], which doesn't match. This PR fixes that by changing QK norm to normalize over [*, H] by default, as in e.g. Gemma3 and the original QK norm paper, with the option to go back to using [*, N, H] available as qk_layernorm_over_heads.

CLAassistant · 2026-01-06T18:34:10Z

All committers have signed the CLA.

StellaAthena · 2026-01-19T15:21:07Z

@christianazinn why is this a better default than the current defualt?

christianazinn · 2026-01-19T15:52:47Z

@christianazinn why is this a better default than the current defualt?

I had assumed that most models used this normalization style, but it seems that's not the case (e.g. Olmo2). Perhaps it ought to be changed to a toggleable option that is enforced for GQA.

default to QK layernorm only over [*, H]

52a736a

christianazinn requested a review from Quentin-Anthony as a code owner January 6, 2026 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to only applying QK norm over the head dimension#1367

Default to only applying QK norm over the head dimension#1367
christianazinn wants to merge 1 commit intoEleutherAI:mainfrom
christianazinn:christian/qk_layernorm_fix

christianazinn commented Jan 6, 2026

Uh oh!

CLAassistant commented Jan 6, 2026 •

edited

Loading

Uh oh!

StellaAthena commented Jan 19, 2026

Uh oh!

christianazinn commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

christianazinn commented Jan 6, 2026

Uh oh!

CLAassistant commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StellaAthena commented Jan 19, 2026

Uh oh!

christianazinn commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jan 6, 2026 •

edited

Loading