Skip to content

Fix model builder for gpt-oss#2228

Merged
kunal-vaishnavi merged 1 commit into
mainfrom
tlwu/fix_gpt_oss_rotary_cache
Jun 15, 2026
Merged

Fix model builder for gpt-oss#2228
kunal-vaishnavi merged 1 commit into
mainfrom
tlwu/fix_gpt_oss_rotary_cache

Conversation

@tianleiwu

Copy link
Copy Markdown
Contributor

The inv_freq is not correct

@tianleiwu tianleiwu requested a review from a team as a code owner June 15, 2026 17:26
Copilot AI review requested due to automatic review settings June 15, 2026 17:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the GPT-OSS Python model builder’s rotary embedding cache computation by correcting the inverse-frequency (inv_freq) formula used to generate RoPE cos/sin caches, aligning it with the standard 1 / (theta ** (i / dim)) definition.

Changes:

  • Correct inv_freq computation in GPTOSSModel.make_rotary_embedding_caches_from_scratch() by applying the missing reciprocal (1.0 / (...)).
  • Preserve existing cache-generation flow while producing corrected RoPE frequency values.

@tianleiwu tianleiwu enabled auto-merge (squash) June 15, 2026 17:46
@kunal-vaishnavi kunal-vaishnavi disabled auto-merge June 15, 2026 18:07
@kunal-vaishnavi kunal-vaishnavi merged commit 7e4bc0d into main Jun 15, 2026
10 of 17 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the tlwu/fix_gpt_oss_rotary_cache branch June 15, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants