Support suppress_tokens / begin_suppress_tokens (HF generation parity) by Copilot · Pull Request #2207 · microsoft/onnxruntime-genai

Copilot · 2026-06-09T23:26:19Z

onnxruntime-genai had no way to suppress specific token IDs during generation, so output diverged from transformers.generate() for models that depend on suppress_tokens / begin_suppress_tokens (e.g. Whisper, Gemma). These options were neither parsed from config nor applied as logits processors.

This adds static per-token suppression mirroring HF's SuppressTokensLogitsProcessor and SuppressTokensAtBeginLogitsProcessor:

Config (config.h/config.cpp): new suppress_tokens / begin_suppress_tokens int-array fields on Config::Search, parsed via a new Search_Element::OnArray.
Logits processor (search.h, search.cpp, cuda/search_cuda.cpp): new Search::ApplySuppressTokens virtual, implemented for CPU and CUDA, setting targeted logits to float lowest — following the existing ApplyMinLength / ApplyRepetitionPenalty pattern.
Application (generators.cpp, engine/request.cpp): suppress_tokens applied every step; begin_suppress_tokens only when current_length == prompt_length. Wired into both the standalone Generator loop and the batching Engine request loop.
API surface: additive C API OgaGeneratorParamsSetSearchTokensArray + C++ wrapper; Python set_search_options(...) now accepts list[int].
Model builder (builders/base.py, builders/whisper.py): auto-populates both from generation_config.json / model config via a shared Model.add_suppress_tokens_to_search_config helper, so existing HF models work without manual config edits.
Tests: C++ greedy-selection tests and Python API tests covering every-step and begin-only suppression.

# Suppressed at every step; greedy falls through to the next unsuppressed token
params.set_search_options(suppress_tokens=[1, 3], begin_suppress_tokens=[50257])

bad_words_ids (multi-token sequence banning), listed as optional in the issue, is intentionally out of scope here.

Note on concurrency

The begin-step is tracked via a per-object length member, mutated non-atomically — consistent with the existing single-threaded-per-Generator/Request design (cf. computed_logits_, is_prefill_). No synchronization added.

…rity)

Initial plan

e5905e1

Copilot AI assigned Copilot and justinchuby Jun 9, 2026

Copilot started work on behalf of justinchuby June 9, 2026 23:26 View session

Copilot AI linked an issue Jun 9, 2026 that may be closed by this pull request

Support suppress_tokens / begin_suppress_tokens (HF generation parity) #2201

Open

Copilot AI added 3 commits June 9, 2026 23:41

Add suppress_tokens / begin_suppress_tokens support (HF generation pa…

85fd235

…rity)

Auto-populate suppress tokens in Whisper model builder config

fd1e50e

Extract shared helper for suppress tokens in model builder

2cad441

Copilot AI changed the title ~~[WIP] Add support for suppress_tokens and begin_suppress_tokens~~ Support suppress_tokens / begin_suppress_tokens (HF generation parity) Jun 9, 2026

Copilot finished work on behalf of justinchuby June 9, 2026 23:48

Copilot AI requested a review from justinchuby June 9, 2026 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support suppress_tokens / begin_suppress_tokens (HF generation parity)#2207

Support suppress_tokens / begin_suppress_tokens (HF generation parity)#2207
Copilot wants to merge 4 commits into
mainfrom
copilot/support-suppress-tokens

Copilot AI commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note on concurrency

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jun 9, 2026 •

edited

Loading