[Infinity] Add e2e pipeline in nightly and benchmark CI by meenakshiramanathan1 · Pull Request #5428 · tenstorrent/tt-xla

meenakshiramanathan1 · 2026-06-30T09:17:26Z

Ticket

Fixes Add Infinity e2e pipeline in nightly and benchmark CI #5353

Problem description

To add Infinity Test-to-image model to nightly + benchmark CI.

What's changed

Nightly (tests/torch/models/infinity/test_infinity_pipeline.py): self-contained e2e pipeline + test. The 2B transformer runs on TT in bf16, 8-way tensor-parallel sharded (mesh (1, 8), Megatron head-parallel attention from loader.load_shard_spec); the T5-XL text encoder, multinomial sampling, and BSQ-VAE decode stay on CPU. Native target is PN="1M" → 1024×1024 (13-scale schedule). It reimplements the model's autoregressive_infer_cfg as a next-scale-prediction loop (not diffusion): per scale a transformer forward + sampling + BSQ-VAE code accumulation, then a single VAE decode.
Benchmark (tests/benchmark/test_imagegen.py::test_infinity_2b + benchmarks/infinity_pipeline.py): config-driven entry through the shared imagegen harness (same path as test_sdxl_lightning); warmup + steady-state passes.
Sequential classifier-free guidance: cond and uncond are run as two batch-1 forwards per scale and combined on the logits. A batch-2 (stacked) CFG forward makes the attention score matmul all-gather the heads (de-shard) and OOM at the final 1M scale (5516034048 B DRAM buffer); batch-1 keeps the score head-sharded.
fp32 LayerNorm: every LayerNorm is computed via an explicit mean/var/rsqrt decomposition in fp32 (_force_fp32_layernorm). The bf16 fused ttnn.layer_norm loses precision on the mid/late layers' outlier activations (per-block PCC drops to ~0.8 at block 19; matmuls stay ~1.0), which autoregressive sampling amplifies into a noise image. The decomposition restores per-block PCC to ~1.0 and yields a coherent image. A plain F.layer_norm(x.float()) does not help — it folds back to the bf16 fused kernel.
Packed-recompute loop (no KV cache): each scale rebuilds the full token sequence and runs all blocks in one sharded forward with a block-causal attn_bias. A KV cache instead de-shards: cached K/V cross the per-scale CPU sampling boundary replicated and feed SDPA directly → all 16 heads on one device → OOM.
bf16 conditioning dtype: shared_ada_lin is kept in bf16 (no .float() upcast) — an f32 input to its bf16 Linear produces a mismatched-dtype dot that fails HLO→MHLO conversion on TT.

Verified per-component PCC against an fp32 CPU reference on the real pipeline tensors.

Final image generated from pipeline:

codecov-commenter · 2026-06-30T13:02:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.82%. Comparing base (cf61c13) to head (d9ed733).
⚠️ Report is 62 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5428      +/-   ##
==========================================
- Coverage   33.87%   33.82%   -0.05%     
==========================================
  Files          37       37              
  Lines        4980     4990      +10     
==========================================
+ Hits         1687     1688       +1     
- Misses       3293     3302       +9

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch from c5c106f to 3b73dd9 Compare June 30, 2026 12:37

meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch 2 times, most recently from d9ed733 to f2a018c Compare July 1, 2026 06:50

[Infinity] Add e2e pipeline in nightly and benchmark CI

d3e23d7

meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch from f2a018c to d3e23d7 Compare July 1, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Infinity] Add e2e pipeline in nightly and benchmark CI#5428

[Infinity] Add e2e pipeline in nightly and benchmark CI#5428
meenakshiramanathan1 wants to merge 1 commit into
mainfrom
mramanathan/infinity_pipeline

meenakshiramanathan1 commented Jun 30, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

meenakshiramanathan1 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Problem description

What's changed

Uh oh!

codecov-commenter commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

meenakshiramanathan1 commented Jun 30, 2026 •

edited

Loading

codecov-commenter commented Jun 30, 2026 •

edited

Loading