Skip to content

[pull] master from tensorflow:master#1671

Merged
pull[bot] merged 2 commits into
GesuBackups:masterfrom
tensorflow:master
Mar 29, 2026
Merged

[pull] master from tensorflow:master#1671
pull[bot] merged 2 commits into
GesuBackups:masterfrom
tensorflow:master

Conversation

@pull

@pull pull Bot commented Mar 29, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ezhulenev and others added 2 commits March 28, 2026 13:32
…/Done

Imported from GitHub PR openxla/xla#39382

**NFC**: this PR doesn't change what collective operations are executed and on what streams, it should be identical order of launched operations on identical stream. The only change is migration from per-operation thunks to structured concurrency based on start/done thunk wrapping.

Migrate all GPU collective thunks to use the generic `AsyncStartThunk`/`AsyncDoneThunk` infrastructure, removing the per-collective async event management and simplifying the thunk hierarchy.

### Key changes:

- **Remove `CollectiveAsyncEvents`**: Collective thunks no longer manage their own async events. The `AsyncStartThunk`/`AsyncDoneThunk` pair handles all structured concurrency.

- **Remove per-collective Done thunks**: `CollectiveDoneCmd`, `NvshmemCollectiveDoneCmd`, and all collective-specific done command logic removed. `AsyncDoneThunk` handles completion for all collectives.

- **Simplify command buffer commands**: `CollectiveCmd` and `NvshmemCollectiveCmd` now extend `Command` directly (not `TrackedCmd`). Removed `CollectiveAsyncEvents` from command constructors and execution.

- **Consolidate `Thunk::Kind` enum**: Removed all collective `*Start` and `*Done` kind variants (e.g., `kAllReduceStart`, `kAllReduceDone`), keeping only base kinds (`kAllReduce`). Added `kNvshmemAllReduce`. Updated `ThunkKindProto` with reserved entries for removed values.

- **Sync collective support**: When `IsGPUSyncCollective` is true, collective thunks are emitted directly without `AsyncStartThunk` wrapping, and `EmitCollectiveAsyncDone` returns an empty sequence.
Copybara import of the project:

--
a6568a645bc3d32e4e313a9076cf3fc9439b6d9a by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Migrate collective thunks to generic Async Start/Done

Merging this change closes #39382

PiperOrigin-RevId: 891024570
…zation since there seems to be corner cases that are not covered.

Reverts 09d30f3

PiperOrigin-RevId: 891080815
@pull pull Bot locked and limited conversation to collaborators Mar 29, 2026
@pull pull Bot added the ⤵️ pull label Mar 29, 2026
@pull pull Bot merged commit 8a30f37 into GesuBackups:master Mar 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants