Support glob patterns in OnnxKQuantQuantization nodes_to_exclude#2518
Merged
Conversation
The k-quant pass only matched nodes_to_exclude entries by exact node name. For models split into multiple components (e.g. multimodal decoder/vision/audio), excluding a layer required hardcoding build-specific node names like 'vision_encoder/projector/MatMul_node_38', which are brittle across builds and architectures. Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern (matched with fnmatch.fnmatchcase) in addition to an exact name. A node is excluded if its name equals or matches any entry, so existing exact-name configs keep working. This makes it possible to write robust exclusions such as '*/projector/*' to keep all projector MatMuls in their original precision. Adds a regression test covering glob-based exclusion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the OnnxKQuantQuantization pass so nodes_to_exclude can match node names using Unix shell-style glob patterns (via fnmatch.fnmatchcase) in addition to exact-name matching, making exclusions more robust for graphs whose node names vary across builds.
Changes:
- Add glob-pattern support to
OnnxKQuantQuantization.nodes_to_exclude. - Expand
nodes_to_excludeconfiguration help text with glob usage guidance. - Add a unit test verifying a glob pattern excludes only the intended node.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
olive/passes/onnx/kquant_quantization.py |
Implements glob-based matching for nodes_to_exclude and updates config documentation. |
test/passes/onnx/test_kquant_quantization.py |
Adds coverage to ensure glob patterns in nodes_to_exclude behave as expected. |
justinchuby
added a commit
to microsoft/olive-recipes
that referenced
this pull request
Jun 12, 2026
For the encoder-free gemma4_unified architecture, each of the vision and audio 'encoders' is a single projector MatMul that forms the entire image/audio embedding pathway. Quantizing it to INT4 injects disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs almost nothing. Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder (including lm_head) stays INT4, where the size savings live and INT4 has negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004). The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older Olive the pattern matches nothing and projectors are quantized as before. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Previously every nodes_to_exclude entry was matched both by exact name and by fnmatch. An exact node name that happens to contain fnmatch metacharacters (e.g. '[' / ']') could then unintentionally match unrelated nodes (e.g. 'foo[1]' also matches 'foo1'). Only treat an entry as a glob pattern when it contains a '*' or '?' wildcard; all other entries are matched by exact name as before. This keeps exact names with bracket metacharacters safe while still supporting glob exclusions like '*/projector/*'. Adds a regression test ensuring a bracketed entry without a wildcard is matched exactly (and therefore does not exclude an unrelated node). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
jambayk
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
OnnxKQuantQuantization.nodes_to_excludeonly matched exact node names. For models that are split into multiple ONNX components (e.g. multimodal decoder / vision / audio), keeping a specific layer out of INT4 required hardcoding build-specific node names such asvision_encoder/projector/MatMul_node_38andaudio_encoder/projector/MatMul_node_3. Those numeric suffixes are assigned at graph-build time and shift across builds and architectures, so the exclusion list is brittle.Concrete case
For an encoder-free multimodal model (Gemma 4
gemma4_unified), the vision and audio "encoders" are a single projection MatMul each — i.e. the entire image/audio pathway. Measured INT4-vs-FP16 error on those projectors is non-trivial (rel-L2 ≈ 3.7% vision, ≈ 9.2% audio), while the components are tiny (76 MB / 1.4 MB), so keeping them in higher precision costs almost nothing. Doing that cleanly needs a robust way to target*/projector/*.Change
Allow each
nodes_to_excludeentry to be a Unix shell-style glob pattern (matched withfnmatch.fnmatchcase) in addition to an exact node name. A node is excluded if its name equals or matches any entry, so existing exact-name configs are unaffected.{ "type": "OnnxKQuantQuantization", "bits": 4, "nodes_to_exclude": ["*/projector/*"] }Why glob and not regex
ONNX node names are slash-separated paths that are full of regex
metacharacters (e.g.
decoder/model/layers.0/...,model.norm). Glob isthe safer and backward-compatible choice:
nodes_to_excludeentries are exactnames containing
.. Under glob,.is a literal, so every existingconfig matches exactly the same node. Treating entries as regex would
reinterpret
.as "any char", silently changing the meaning of existingconfigs and risking over-exclusion (
layers.0would also matchlayers00). Glob is a clean superset: exact names behave identically and*/?are the only added powers..in everydotted path name; glob needs none for the common case.
fnmatchmatches the whole string, so*/projector/*unambiguously means "whole name matches this shape",avoiding the
re.match/re.search/re.fullmatchambiguity.for path matching (shell,
.gitignore).Regex is strictly more expressive (alternation, character classes, numeric
ranges), but that is rarely needed here and can be added later without
breaking glob (e.g. a
re:prefix or a separate option) if a real needappears.
Testing
test_kquant_with_nodes_to_exclude_glob(pattern*_1excludes one of two MatMuls); the existing exact-name and uniform tests still pass —4 passed.ruff checkclean on the changed files.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com