Skip to content

Exposing CCL activation lowering flag in compiler config#5396

Open
dgolubovicTT wants to merge 1 commit into
mainfrom
dgolubovic/activation-dtype-lowering
Open

Exposing CCL activation lowering flag in compiler config#5396
dgolubovicTT wants to merge 1 commit into
mainfrom
dgolubovic/activation-dtype-lowering

Conversation

@dgolubovicTT

Copy link
Copy Markdown
Contributor

Ticket

Closes #5178

What's changed

Exposing new Mixed Precision feature - CCL activation lowering, through compiler config

  • Lowering of activation dtype from default bf16 to bfp8 around CCL operations (reduce scatter, all gather) for better perf. Moving around almost 2x less data by going from bf16 to bfp8.
  • Default off. For now, it will be turned on per model if it doesn't regress the e2e accuracy. For start, turned on on llama 70b in this PR.
  • Currently patterns in tt-mlir are written specifically for llama architecture, so they will under trigger by design. We will add more patterns for latest architectures (Deepseek, GLM, Kimi) and merge similar patterns to more general ones...

Future

This feature will be worked on. Mainly adding new patterns in tt-mlir to lower CCL activation dtypes in new architectures. So if this flag is on and activations around CCLs are not lowered -> please open issue to @dgolubovicTT with a repro.

@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.78%. Comparing base (81c24d4) to head (ffe0a4e).

Files with missing lines Patch % Lines
pjrt_implementation/src/api/compile_options.cc 0.00% 4 Missing ⚠️
...mentation/src/api/module_builder/module_builder.cc 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5396      +/-   ##
==========================================
- Coverage   33.80%   33.78%   -0.03%     
==========================================
  Files          37       37              
  Lines        4990     4996       +6     
==========================================
+ Hits         1687     1688       +1     
- Misses       3303     3308       +5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@dgolubovicTT dgolubovicTT force-pushed the dgolubovic/activation-dtype-lowering branch from 42d11ff to ffe0a4e Compare July 1, 2026 10:37
// / all_gather -> consumer). Pattern-matches Llama-style sub-graphs
// (O-proj+residual, MLP). Default off; flip on
// per-model after validating model accuracy doesn't degrade.
bool enable_activation_dtype_lowering = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general, non-blocking comment: we should think about how to unify AMP-related features/flags/switches. TBD for next iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose CCL activation dtype lowering from tt-mlir

5 participants