[Infinity] Add e2e pipeline in nightly and benchmark CI#5428
Draft
meenakshiramanathan1 wants to merge 1 commit into
Draft
[Infinity] Add e2e pipeline in nightly and benchmark CI#5428meenakshiramanathan1 wants to merge 1 commit into
meenakshiramanathan1 wants to merge 1 commit into
Conversation
c5c106f to
3b73dd9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5428 +/- ##
==========================================
- Coverage 33.87% 33.82% -0.05%
==========================================
Files 37 37
Lines 4980 4990 +10
==========================================
+ Hits 1687 1688 +1
- Misses 3293 3302 +9 ☔ View full report in Codecov by Harness. |
d9ed733 to
f2a018c
Compare
f2a018c to
d3e23d7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
Problem description
To add Infinity Test-to-image model to nightly + benchmark CI.
What's changed
Nightly (tests/torch/models/infinity/test_infinity_pipeline.py): self-contained e2e pipeline + test. The 2B transformer runs on TT in bf16, 8-way tensor-parallel sharded (mesh (1, 8), Megatron head-parallel attention from loader.load_shard_spec); the T5-XL text encoder, multinomial sampling, and BSQ-VAE decode stay on CPU. Native target is PN="1M" → 1024×1024 (13-scale schedule). It reimplements the model's autoregressive_infer_cfg as a next-scale-prediction loop (not diffusion): per scale a transformer forward + sampling + BSQ-VAE code accumulation, then a single VAE decode.
Benchmark (tests/benchmark/test_imagegen.py::test_infinity_2b + benchmarks/infinity_pipeline.py): config-driven entry through the shared imagegen harness (same path as test_sdxl_lightning); warmup + steady-state passes.
Sequential classifier-free guidance: cond and uncond are run as two batch-1 forwards per scale and combined on the logits. A batch-2 (stacked) CFG forward makes the attention score matmul all-gather the heads (de-shard) and OOM at the final 1M scale (5516034048 B DRAM buffer); batch-1 keeps the score head-sharded.
fp32 LayerNorm: every LayerNorm is computed via an explicit mean/var/rsqrt decomposition in fp32 (_force_fp32_layernorm). The bf16 fused ttnn.layer_norm loses precision on the mid/late layers' outlier activations (per-block PCC drops to ~0.8 at block 19; matmuls stay ~1.0), which autoregressive sampling amplifies into a noise image. The decomposition restores per-block PCC to ~1.0 and yields a coherent image. A plain F.layer_norm(x.float()) does not help — it folds back to the bf16 fused kernel.
Packed-recompute loop (no KV cache): each scale rebuilds the full token sequence and runs all blocks in one sharded forward with a block-causal attn_bias. A KV cache instead de-shards: cached K/V cross the per-scale CPU sampling boundary replicated and feed SDPA directly → all 16 heads on one device → OOM.
bf16 conditioning dtype: shared_ada_lin is kept in bf16 (no .float() upcast) — an f32 input to its bf16 Linear produces a mismatched-dtype dot that fails HLO→MHLO conversion on TT.
Verified per-component PCC against an fp32 CPU reference on the real pipeline tensors.
Final image generated from pipeline:
