Re-register tt_moe after per-model transformers swap#5424
Open
kamalrajkannan78 wants to merge 1 commit into
Open
Re-register tt_moe after per-model transformers swap#5424kamalrajkannan78 wants to merge 1 commit into
kamalrajkannan78 wants to merge 1 commit into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5424 +/- ##
=======================================
Coverage 33.84% 33.84%
=======================================
Files 37 37
Lines 4990 4990
=======================================
Hits 1689 1689
Misses 3301 3301 ☔ View full report in Codecov by Harness. |
This was referenced Jun 30, 2026
Merged
kamalrajkannan78
added a commit
to tenstorrent/tt-forge-models
that referenced
this pull request
Jul 1, 2026
…790) ### Ticket - tenstorrent/tt-xla#5423 ### Problem description - DiffusionGemma's forward initializes the decoder canvas with [`torch.randint` when `decoder_input_ids is None`](https://github.com/huggingface/transformers/blob/e0e7504bca2bfd1b85bb0eedb148f7b250226f06/src/transformers/models/diffusion_gemma/modeling_diffusion_gemma.py#L1570-L1575). On device, `randint` lowers to `rng_bit_generator` + an unsigned `remainder` (`bits % range`), and tt-metal doesn't support `remainder` on `UINT32` → the model crashes with `Unsupported data type for remainder DataType::UINT32`. ### What's changed - `load_inputs` now passes `decoder_input_ids` (host-side `torch.randint`, same shape/range/dtype as the model's own init), so the model skips its internal device-side `randint` and no `remainder` op is emitted. The model now compiles and runs end-to-end on 8 chips. This is just a quick workaround for the tt-metal gap (UINT32 `remainder`); removable once supported. - With this fix and [tt-xla#5424](tenstorrent/tt-xla#5424), models now fails at runtime with `PCC=0.96` ### Checklist - [x] Verify the changes through local testing in WH ### Logs - [jun30_diffgemma_before_work_around.log.zip](https://github.com/user-attachments/files/29496015/jun30_diffgemma_after_fix.log.zip) - [jun30_diffgemma_after_work_around.log.zip](https://github.com/user-attachments/files/29516803/jun30_diffgemma_after_workaround.log.zip) Co-authored-by: ctr-kkannan <ctr-kkannan@ext.tenstorrent.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
tt_moeis not a valid experts implementation registered in theExpertsInterface' #5425Problem description
26B-A4B-it) needstransformers==5.12.0, but the current env has5.5.1. While running its test it fails withKeyError: 'tt_moe' is not a valid experts implementation registered in the ExpertsInterface.tt_moeis registered as an import-time side-effect oftt_torch.moe_backend, but the dynamic runner then reinstalls the model'stransformersversion (5.5.1 → 5.12.0) — which replacestransformers(and itsExpertsInterface) in memory, dropping the registration. It's never re-applied, so the live module has nott_moe.What's changed
register_tt_moe_backend()is now re-entrant: it re-resolvesExpertsInterface/ALL_EXPERTS_FUNCTIONS/PreTrainedModelfrom the currently-loadedtransformerson each call (and re-patches the livePreTrainedModel), so it targets whatever version is live._inject_custom_moecalls it after the model loads (post version-swap), registeringtt_moeinto the liveExpertsInterface. Fires only when the custom backend is enabled; a no-op when no swap happened.5.5.1). Once the env is uplifted to ≥ the required version (5.12.0), the per-model swap won't happen and I'll remove this.Checklist
Logs