Skip to content

Implement no_repeat_ngram_size for CPU search#2242

Open
mustjab wants to merge 1 commit into
microsoft:mainfrom
mustjab:no-repeat-ngram-size
Open

Implement no_repeat_ngram_size for CPU search#2242
mustjab wants to merge 1 commit into
microsoft:mainfrom
mustjab:no-repeat-ngram-size

Conversation

@mustjab

@mustjab mustjab commented Jun 25, 2026

Copy link
Copy Markdown

The no_repeat_ngram_size search option was parsed from genai_config.json and settable via set_search_options / OgaGeneratorParamsSetSearchNumber, but no logits processor enforced it, so it had no effect. Greedy decoding could emit unbounded repeated n-grams (e.g. Marian translation looping to max_length on degenerate input).

Add a CPU logits processor that bans any token which would complete an already-seen n-gram of the configured size, matching HuggingFace no_repeat_ngram_size semantics.

Changes:

  • search.h: virtual ApplyNoRepeatNgram no-op default (CUDA/beam unaffected), Search_Cpu override.
  • search.cpp: ApplyNoRepeatNgram implementation.
  • generators.cpp, engine/request.cpp: invoke after ApplyRepetitionPenalty.
  • config.h: document the field.
  • sampling_tests.cpp: NoRepeatNgramCorrectnessCpu unit test.

The no_repeat_ngram_size search option was parsed from genai_config.json
and settable via set_search_options / OgaGeneratorParamsSetSearchNumber,
but no logits processor enforced it, so it had no effect. Greedy decoding
could emit unbounded repeated n-grams (e.g. Marian translation looping to
max_length on degenerate input).

Add a CPU logits processor that bans any token which would complete an
already-seen n-gram of the configured size, matching HuggingFace
no_repeat_ngram_size semantics.

Changes:
- search.h: virtual ApplyNoRepeatNgram no-op default (CUDA/beam unaffected),
  Search_Cpu override.
- search.cpp: ApplyNoRepeatNgram implementation.
- generators.cpp, engine/request.cpp: invoke after ApplyRepetitionPenalty.
- config.h: document the field.
- sampling_tests.cpp: NoRepeatNgramCorrectnessCpu unit test.
Copilot AI review requested due to automatic review settings June 25, 2026 21:47
@mustjab mustjab requested a review from a team as a code owner June 25, 2026 21:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the existing no_repeat_ngram_size search option actually take effect for CPU decoding by adding a CPU-side logits processor that bans tokens which would complete a previously seen n-gram (matching HuggingFace semantics). This closes a functional gap where the option was parsed and settable but previously had no impact on generation behavior.

Changes:

  • Add a Search::ApplyNoRepeatNgram() virtual hook (default no-op) and implement the CPU override to ban tokens that would repeat an n-gram.
  • Invoke ApplyNoRepeatNgram() during per-token scoring in both the direct generator path and the engine request path.
  • Add a CPU correctness unit test that forces greedy decoding to choose a fallback token when the highest-scoring token would repeat an n-gram.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/sampling_tests.cpp Adds a CPU unit test validating greedy decoding respects no_repeat_ngram_size.
src/search.h Introduces ApplyNoRepeatNgram() (default no-op) and declares CPU override.
src/search.cpp Implements CPU n-gram blocking by banning tokens that would repeat a seen n-gram.
src/generators.cpp Calls ApplyNoRepeatNgram() in the non-engine generation loop after repetition penalty.
src/engine/request.cpp Calls ApplyNoRepeatNgram() in the engine request generation loop after repetition penalty.
src/config.h Updates documentation comment for no_repeat_ngram_size to reflect actual behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants