Implement binary search for bfp4 weight lowering based on sensitivity by kdimicTT · Pull Request #5418 · tenstorrent/tt-xla

kdimicTT · 2026-06-29T21:14:22Z

Problem description

When running models with mixed precision, not all weights tolerate aggressive quantization equally. Lowering every weight to bfp_bf4 maximizes memory savings but often degrades accuracy below acceptable thresholds. There was no systematic way to find the optimal subset of weights to lower to bfp_bf4 while keeping accuracy above a target.

The sensitivity scores consumed by this script are produced by the per-weight scoring implemented in #5092.

What's changed

Adds mixed_precision/lowering_search.py script that automates the search for the maximum number of weights that can be safely lowered to bfp_bf4.

Approach:

Reads a pre-computed sensitivity scores JSON (weights ranked from most to least sensitive) for the target model.
Runs three reference baselines: all-bfp_bf8, all-MLP-at-bfp_bf4, all-bfp_bf4.
Binary searches over the count k of least-sensitive weights to lower, evaluating each candidate by running the accuracy benchmark test and parsing the TOP1 p5 metric against a threshold.
Writes the resulting mixed-precision config JSON and a markdown results report with optional per-iteration logs.

codecov-commenter · 2026-06-29T22:01:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.84%. Comparing base (8f71001) to head (7f977f4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5418   +/-   ##
=======================================
  Coverage   33.84%   33.84%           
=======================================
  Files          37       37           
  Lines        4990     4990           
=======================================
  Hits         1689     1689           
  Misses       3301     3301

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Implement binary search for bfp4 weight lowering based on sensitivity

51bac11

kdimicTT force-pushed the kdimic/mp-lowering-search branch from 38cb8c4 to 51bac11 Compare June 29, 2026 21:15

kdimicTT force-pushed the kdimic/mp-lowering-search branch from 96aab6f to 0c57788 Compare June 30, 2026 08:43

Add --save-logs flag for optional per-iteration logs

6c12760

kdimicTT force-pushed the kdimic/mp-lowering-search branch from 0c57788 to 6c12760 Compare June 30, 2026 08:44

kdimicTT marked this pull request as ready for review June 30, 2026 10:41

kdimicTT requested review from AleksKnezevic, mrakitaTT and nvukobratTT as code owners June 30, 2026 10:41

Add --skip-baselines option

7f977f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement binary search for bfp4 weight lowering based on sensitivity#5418

Implement binary search for bfp4 weight lowering based on sensitivity#5418
kdimicTT wants to merge 3 commits into
mainfrom
kdimic/mp-lowering-search

kdimicTT commented Jun 29, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kdimicTT commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

What's changed

Uh oh!

codecov-commenter commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kdimicTT commented Jun 29, 2026 •

edited

Loading

codecov-commenter commented Jun 29, 2026 •

edited

Loading