Automating transformers uplift + fixes#5412
Conversation
3999880 to
997cc97
Compare
|
I ran an e2e test uplifting transformers 5.5.1 -> 5.9.0
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5412 +/- ##
=======================================
Coverage 33.81% 33.81%
=======================================
Files 37 37
Lines 4992 4992
=======================================
Hits 1688 1688
Misses 3304 3304 ☔ View full report in Codecov by Harness. |
|
Pls check |
Ticket
#3608
Problem description
Each new
transformersrelease breaks the model test suite. Detecting which models regress, classifying real failures vs. pre-existing ones, and patching the loaders + plumbing in tt-xla andtt_forge_modelsis a repetitive multi-day chore that gates newertransformersversions from landing onmain.What's changed
End-to-end CI pipeline that helps automating the uplift + fixes:
transformers-uplift-fixClaude skill (.claude/skills/transformers-uplift-fix/SKILL.md) — receives aSCOPE(api-check/model-test-uplifts/model-perf-uplift) and a captured failure context, consults the upstream changelog betweenCURRENT_VERSIONandTARGET_VERSIONto identify root causes, and edits source under tt-xla +tt_forge_modelsonly. Hard rules: no monkey-patching, noif version < Xshims, no drive-by refactors, no git ops (the orchestrator owns every commit and push). Writes a fix summary to.github/transformers-uplift/fix-summary.mdthat the orchestrator uses as the commit body.schedule-transformers-uplift.yml— nightly cron polls PyPI for the next stabletransformersrelease, creates atransformers-uplift/<ver>WIP branch on tt-xla andtt_forge_models, bumps the pin invenv/requirements-dev.txt, and dispatches the orchestrator. Resolves a baselineschedule-nightlyrun onmainto feed downstream filtering.workflow-transformers-uplift.yml— orchestrates the fix loop:pytest --collect-onlysweep to surface import / signature breaks; Claude fixes via a self-bounded sub-loop (up to 5 retries).baseline_uplift) for a fast framework-level signal before paying for the full suite.MAX_ITERATIONS, self-redispatches iteration N+1; otherwise advances to the full passes.model-test-passing.jsonsuite.transformers-uplift-fixClaude skill. Patches land on both tt-xla and thett_forge_modelssubmodule. Branch is sourced from one place (github.ref_name).call-test-uplift.yml/call-perf-uplift.yml— reusable wrappers aroundcall-test.ymlandcall-filtered-perf-tests.yml. Compare current failures against a baseline nightly run so Claude only sees uplift-induced regressions, not pre-existing failures.manual-test-uplift.yml/manual-perf-uplift.yml— user-dispatchable wrappers for debugging a single suite/runner against an existing WIP branch.detect-new-version.sh(PyPI next-stable picker),bump-transformers.sh(requirements update),extract-failures.py/extract-perf-failures.py(junit + log parsing into Claude-friendly context),run-api-check.sh,run-claude-fix.sh.Why these 33 models for
baseline_upliftCurated for architectural diversity rather than count — one or two representatives per family that exercise the corners of the
transformersAPI most likely to break on a release:These hit Cache, attention, attention-mask, tokenizer, image-processor, audio-processor, and generation surfaces — i.e. the high-churn areas.
Checklist