Re-register tt_moe after per-model transformers swap by kamalrajkannan78 · Pull Request #5424 · tenstorrent/tt-xla

kamalrajkannan78 · 2026-06-30T07:51:08Z

Ticket

fixes [DiffusionGemma] KeyError: 'tt_moe is not a valid experts implementation registered in the ExpertsInterface' #5425

Problem description

DiffusionGemma (26B-A4B-it) needs transformers==5.12.0, but the current env has 5.5.1. While running its test it fails with KeyError: 'tt_moe' is not a valid experts implementation registered in the ExpertsInterface. tt_moe is registered as an import-time side-effect of tt_torch.moe_backend, but the dynamic runner then reinstalls the model's transformers version (5.5.1 → 5.12.0) — which replaces transformers (and its ExpertsInterface) in memory, dropping the registration. It's never re-applied, so the live module has no tt_moe.

What's changed

register_tt_moe_backend() is now re-entrant: it re-resolves ExpertsInterface / ALL_EXPERTS_FUNCTIONS / PreTrainedModel from the currently-loaded transformers on each call (and re-patches the live PreTrainedModel), so it targets whatever version is live.
_inject_custom_moe calls it after the model loads (post version-swap), registering tt_moe into the live ExpertsInterface. Fires only when the custom backend is enabled; a no-op when no swap happened.
Note: this is a workaround for the current env (transformers 5.5.1). Once the env is uplifted to ≥ the required version (5.12.0), the per-model swap won't happen and I'll remove this.

Checklist

Verify the changes through local testing in WH

Logs

codecov-commenter · 2026-06-30T08:08:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.84%. Comparing base (8f71001) to head (68ca1be).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5424   +/-   ##
=======================================
  Coverage   33.84%   33.84%           
=======================================
  Files          37       37           
  Lines        4990     4990           
=======================================
  Hits         1689     1689           
  Misses       3301     3301

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

…790) ### Ticket - tenstorrent/tt-xla#5423 ### Problem description - DiffusionGemma's forward initializes the decoder canvas with [`torch.randint` when `decoder_input_ids is None`](https://github.com/huggingface/transformers/blob/e0e7504bca2bfd1b85bb0eedb148f7b250226f06/src/transformers/models/diffusion_gemma/modeling_diffusion_gemma.py#L1570-L1575). On device, `randint` lowers to `rng_bit_generator` + an unsigned `remainder` (`bits % range`), and tt-metal doesn't support `remainder` on `UINT32` → the model crashes with `Unsupported data type for remainder DataType::UINT32`. ### What's changed - `load_inputs` now passes `decoder_input_ids` (host-side `torch.randint`, same shape/range/dtype as the model's own init), so the model skips its internal device-side `randint` and no `remainder` op is emitted. The model now compiles and runs end-to-end on 8 chips. This is just a quick workaround for the tt-metal gap (UINT32 `remainder`); removable once supported. - With this fix and [tt-xla#5424](tenstorrent/tt-xla#5424), models now fails at runtime with `PCC=0.96` ### Checklist - [x] Verify the changes through local testing in WH ### Logs - [jun30_diffgemma_before_work_around.log.zip](https://github.com/user-attachments/files/29496015/jun30_diffgemma_after_fix.log.zip) - [jun30_diffgemma_after_work_around.log.zip](https://github.com/user-attachments/files/29516803/jun30_diffgemma_after_workaround.log.zip) Co-authored-by: ctr-kkannan <ctr-kkannan@ext.tenstorrent.com>

Re-register tt_moe after per-model transformers swap

68ca1be

kamalrajkannan78 marked this pull request as ready for review June 30, 2026 11:12

kamalrajkannan78 requested review from AleksKnezevic, acicovicTT, dgolubovicTT, jameszianxuTT, kmabeeTT, mrakitaTT, mstojkovicTT, ndrakulicTT, nvukobratTT, sdjukicTT, sgligorijevicTT and vkovinicTT as code owners June 30, 2026 11:12

This was referenced Jun 30, 2026

[diffusiongemma] Pass decoder_input_ids to skip the model's randint tenstorrent/tt-forge-models#790

Merged

[DiffusionGemma] Unsupported data type for remainder DataType::UINT32 #5423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-register tt_moe after per-model transformers swap#5424

Re-register tt_moe after per-model transformers swap#5424
kamalrajkannan78 wants to merge 1 commit into
mainfrom
kkannan/jun30_reregister_moe

kamalrajkannan78 commented Jun 30, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kamalrajkannan78 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Problem description

What's changed

Checklist

Logs

Uh oh!

codecov-commenter commented Jun 30, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kamalrajkannan78 commented Jun 30, 2026 •

edited

Loading