Skip to content

Re-register tt_moe after per-model transformers swap#5424

Open
kamalrajkannan78 wants to merge 1 commit into
mainfrom
kkannan/jun30_reregister_moe
Open

Re-register tt_moe after per-model transformers swap#5424
kamalrajkannan78 wants to merge 1 commit into
mainfrom
kkannan/jun30_reregister_moe

Conversation

@kamalrajkannan78

@kamalrajkannan78 kamalrajkannan78 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Ticket

Problem description

  • DiffusionGemma (26B-A4B-it) needs transformers==5.12.0, but the current env has 5.5.1. While running its test it fails with KeyError: 'tt_moe' is not a valid experts implementation registered in the ExpertsInterface. tt_moe is registered as an import-time side-effect of tt_torch.moe_backend, but the dynamic runner then reinstalls the model's transformers version (5.5.1 → 5.12.0) — which replaces transformers (and its ExpertsInterface) in memory, dropping the registration. It's never re-applied, so the live module has no tt_moe.

What's changed

  • register_tt_moe_backend() is now re-entrant: it re-resolves ExpertsInterface / ALL_EXPERTS_FUNCTIONS / PreTrainedModel from the currently-loaded transformers on each call (and re-patches the live PreTrainedModel), so it targets whatever version is live.
  • _inject_custom_moe calls it after the model loads (post version-swap), registering tt_moe into the live ExpertsInterface. Fires only when the custom backend is enabled; a no-op when no swap happened.
  • Note: this is a workaround for the current env (transformers 5.5.1). Once the env is uplifted to ≥ the required version (5.12.0), the per-model swap won't happen and I'll remove this.

Checklist

  • Verify the changes through local testing in WH

Logs

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.84%. Comparing base (8f71001) to head (68ca1be).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5424   +/-   ##
=======================================
  Coverage   33.84%   33.84%           
=======================================
  Files          37       37           
  Lines        4990     4990           
=======================================
  Hits         1689     1689           
  Misses       3301     3301           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@kamalrajkannan78 kamalrajkannan78 marked this pull request as ready for review June 30, 2026 11:12
kamalrajkannan78 added a commit to tenstorrent/tt-forge-models that referenced this pull request Jul 1, 2026
…790)

### Ticket

- tenstorrent/tt-xla#5423

### Problem description

- DiffusionGemma's forward initializes the decoder canvas with
[`torch.randint` when `decoder_input_ids is
None`](https://github.com/huggingface/transformers/blob/e0e7504bca2bfd1b85bb0eedb148f7b250226f06/src/transformers/models/diffusion_gemma/modeling_diffusion_gemma.py#L1570-L1575).
On device, `randint` lowers to `rng_bit_generator` + an unsigned
`remainder` (`bits % range`), and tt-metal doesn't support `remainder`
on `UINT32` → the model crashes with `Unsupported data type for
remainder DataType::UINT32`.

### What's changed

- `load_inputs` now passes `decoder_input_ids` (host-side
`torch.randint`, same shape/range/dtype as the model's own init), so the
model skips its internal device-side `randint` and no `remainder` op is
emitted. The model now compiles and runs end-to-end on 8 chips. This is
just a quick workaround for the tt-metal gap (UINT32 `remainder`);
removable once supported.
- With this fix and
[tt-xla#5424](tenstorrent/tt-xla#5424), models
now fails at runtime with `PCC=0.96`

### Checklist
- [x] Verify the changes through local testing in WH

### Logs

-
[jun30_diffgemma_before_work_around.log.zip](https://github.com/user-attachments/files/29496015/jun30_diffgemma_after_fix.log.zip)
-
[jun30_diffgemma_after_work_around.log.zip](https://github.com/user-attachments/files/29516803/jun30_diffgemma_after_workaround.log.zip)

Co-authored-by: ctr-kkannan <ctr-kkannan@ext.tenstorrent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DiffusionGemma] KeyError: 'tt_moe is not a valid experts implementation registered in the ExpertsInterface'

2 participants