Skip to content

Implement binary search for bfp4 weight lowering based on sensitivity#5418

Open
kdimicTT wants to merge 3 commits into
mainfrom
kdimic/mp-lowering-search
Open

Implement binary search for bfp4 weight lowering based on sensitivity#5418
kdimicTT wants to merge 3 commits into
mainfrom
kdimic/mp-lowering-search

Conversation

@kdimicTT

@kdimicTT kdimicTT commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Problem description

When running models with mixed precision, not all weights tolerate aggressive quantization equally. Lowering every weight to bfp_bf4 maximizes memory savings but often degrades accuracy below acceptable thresholds. There was no systematic way to find the optimal subset of weights to lower to bfp_bf4 while keeping accuracy above a target.

The sensitivity scores consumed by this script are produced by the per-weight scoring implemented in #5092.

What's changed

Adds mixed_precision/lowering_search.py script that automates the search for the maximum number of weights that can be safely lowered to bfp_bf4.

Approach:

  1. Reads a pre-computed sensitivity scores JSON (weights ranked from most to least sensitive) for the target model.
  2. Runs three reference baselines: all-bfp_bf8, all-MLP-at-bfp_bf4, all-bfp_bf4.
  3. Binary searches over the count k of least-sensitive weights to lower, evaluating each candidate by running the accuracy benchmark test and parsing the TOP1 p5 metric against a threshold.
  4. Writes the resulting mixed-precision config JSON and a markdown results report with optional per-iteration logs.

@kdimicTT kdimicTT force-pushed the kdimic/mp-lowering-search branch from 38cb8c4 to 51bac11 Compare June 29, 2026 21:15
@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.84%. Comparing base (8f71001) to head (7f977f4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5418   +/-   ##
=======================================
  Coverage   33.84%   33.84%           
=======================================
  Files          37       37           
  Lines        4990     4990           
=======================================
  Hits         1689     1689           
  Misses       3301     3301           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@kdimicTT kdimicTT force-pushed the kdimic/mp-lowering-search branch from 96aab6f to 0c57788 Compare June 30, 2026 08:43
@kdimicTT kdimicTT force-pushed the kdimic/mp-lowering-search branch from 0c57788 to 6c12760 Compare June 30, 2026 08:44
@kdimicTT kdimicTT marked this pull request as ready for review June 30, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants