Support suppress_tokens / begin_suppress_tokens (HF generation parity)

### Feature request: support `suppress_tokens` / `begin_suppress_tokens` (and ideally `bad_words_ids`)

#### Problem

`onnxruntime-genai` has no way to suppress specific token IDs during generation. HuggingFace `transformers.generate()` supports:

- **`suppress_tokens`** — token IDs whose logits are set to `-inf` at **every** decoding step.
- **`begin_suppress_tokens`** — token IDs suppressed **only at the first generated step**.
- **`bad_words_ids`** — sequences that may never be generated.

Many models ship these in their `generation_config.json` and depend on them for correct output (e.g. Whisper suppresses non-speech/special tokens; Gemma4 base/IT suppresses `end_of_image`/`end_of_audio` placeholder-closing tokens). Because genai ignores them, its output **diverges from HF** for these models.

#### Evidence

The full set of recognized search options is enumerated in `src/config.cpp` (`Search_Element::OnValue`, ~L1245–1291):

```
min_length, max_length, batch_size, num_beams, num_return_sequences,
top_k, top_p, temperature, repetition_penalty, length_penalty,
no_repeat_ngram_size, diversity_penalty, random_seed, chunk_size,
do_sample, past_present_share_buffer, early_stopping, blank_penalty
```

There is no `suppress_tokens` / `begin_suppress_tokens` / `bad_words_ids` key, and `grep -rni "suppress\|bad_word" src/` finds no related logic (only `GC.SuppressFinalize` etc.). A `GuidanceLogitsProcessor` exists for grammar-constrained decoding, but that is not a substitute for static per-token suppression driven by `generation_config`.

#### Impact

Validating mobius-exported Gemma4 models against HF, we had to **re-implement token suppression in the test harness** to reproduce HF's greedy output — `model.generate()` applies `suppress_tokens`, but a genai generation loop does not, so the two disagree token-for-token unless suppression is added manually. Any downstream user running these models through genai gets subtly wrong generations with no error.

#### Proposed solution

1. Parse `suppress_tokens` and `begin_suppress_tokens` (arrays of ints) in the search config (`src/config.cpp`) and from `set_search_options`.
2. Add a logits processor that sets the corresponding logits to `-inf` — `suppress_tokens` on every step, `begin_suppress_tokens` only when `current_length == prompt_length`.
3. Auto-populate these from the model's `generation_config.json` at build/convert time (or read them in `genai_config.json`) so existing HF models work out of the box.
4. (Optional) `bad_words_ids` for multi-token sequence banning.

This mirrors HF's `SuppressTokensLogitsProcessor`, `SuppressTokensAtBeginLogitsProcessor`, and `NoBadWordsLogitsProcessor`.

#### Environment

- onnxruntime-genai 0.14.0-dev (commit `9b875c3`)
- Observed while exporting/validating `google/gemma-4-*` (and applies to Whisper).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support suppress_tokens / begin_suppress_tokens (HF generation parity) #2201

Feature request: support `suppress_tokens` / `begin_suppress_tokens` (and ideally `bad_words_ids`)

Problem

Evidence

Impact

Proposed solution

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Support suppress_tokens / begin_suppress_tokens (HF generation parity) #2201

Description

Feature request: support suppress_tokens / begin_suppress_tokens (and ideally bad_words_ids)

Problem

Evidence

Impact

Proposed solution

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Feature request: support `suppress_tokens` / `begin_suppress_tokens` (and ideally `bad_words_ids`)