Skip to content

[Infinity] Add e2e pipeline in nightly and benchmark CI#5428

Draft
meenakshiramanathan1 wants to merge 1 commit into
mainfrom
mramanathan/infinity_pipeline
Draft

[Infinity] Add e2e pipeline in nightly and benchmark CI#5428
meenakshiramanathan1 wants to merge 1 commit into
mainfrom
mramanathan/infinity_pipeline

Conversation

@meenakshiramanathan1

@meenakshiramanathan1 meenakshiramanathan1 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Ticket

Problem description

To add Infinity Test-to-image model to nightly + benchmark CI.

What's changed

  • Nightly (tests/torch/models/infinity/test_infinity_pipeline.py): self-contained e2e pipeline + test. The 2B transformer runs on TT in bf16, 8-way tensor-parallel sharded (mesh (1, 8), Megatron head-parallel attention from loader.load_shard_spec); the T5-XL text encoder, multinomial sampling, and BSQ-VAE decode stay on CPU. Native target is PN="1M" → 1024×1024 (13-scale schedule). It reimplements the model's autoregressive_infer_cfg as a next-scale-prediction loop (not diffusion): per scale a transformer forward + sampling + BSQ-VAE code accumulation, then a single VAE decode.

  • Benchmark (tests/benchmark/test_imagegen.py::test_infinity_2b + benchmarks/infinity_pipeline.py): config-driven entry through the shared imagegen harness (same path as test_sdxl_lightning); warmup + steady-state passes.

  • Sequential classifier-free guidance: cond and uncond are run as two batch-1 forwards per scale and combined on the logits. A batch-2 (stacked) CFG forward makes the attention score matmul all-gather the heads (de-shard) and OOM at the final 1M scale (5516034048 B DRAM buffer); batch-1 keeps the score head-sharded.

  • fp32 LayerNorm: every LayerNorm is computed via an explicit mean/var/rsqrt decomposition in fp32 (_force_fp32_layernorm). The bf16 fused ttnn.layer_norm loses precision on the mid/late layers' outlier activations (per-block PCC drops to ~0.8 at block 19; matmuls stay ~1.0), which autoregressive sampling amplifies into a noise image. The decomposition restores per-block PCC to ~1.0 and yields a coherent image. A plain F.layer_norm(x.float()) does not help — it folds back to the bf16 fused kernel.

  • Packed-recompute loop (no KV cache): each scale rebuilds the full token sequence and runs all blocks in one sharded forward with a block-causal attn_bias. A KV cache instead de-shards: cached K/V cross the per-scale CPU sampling boundary replicated and feed SDPA directly → all 16 heads on one device → OOM.

  • bf16 conditioning dtype: shared_ada_lin is kept in bf16 (no .float() upcast) — an f32 input to its bf16 Linear produces a mismatched-dtype dot that fails HLO→MHLO conversion on TT.

Verified per-component PCC against an fp32 CPU reference on the real pipeline tensors.

Final image generated from pipeline:
infinity_2b_output

@meenakshiramanathan1 meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch from c5c106f to 3b73dd9 Compare June 30, 2026 12:37
@codecov-commenter

codecov-commenter commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.82%. Comparing base (cf61c13) to head (d9ed733).
⚠️ Report is 62 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5428      +/-   ##
==========================================
- Coverage   33.87%   33.82%   -0.05%     
==========================================
  Files          37       37              
  Lines        4980     4990      +10     
==========================================
+ Hits         1687     1688       +1     
- Misses       3293     3302       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@meenakshiramanathan1 meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch 2 times, most recently from d9ed733 to f2a018c Compare July 1, 2026 06:50
@meenakshiramanathan1 meenakshiramanathan1 force-pushed the mramanathan/infinity_pipeline branch from f2a018c to d3e23d7 Compare July 1, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Infinity e2e pipeline in nightly and benchmark CI

2 participants