[pull] master from tensorflow:master by pull[bot] · Pull Request #1981 · GesuBackups/tensorflow

pull · 2026-06-29T19:29:15Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Imported from GitHub PR openxla/xla#43894 📝 Summary of Changes Inserts `griddepcontrol.launch_dependents` instructions between Triton GEMM mainloops and epilogues. They enable further overlaps of kernels on top of the existing base implementation (openxla/xla#38544). The insertion logic is based on the observation that most Triton GEMMs, even those without epilogue in HLO, do have epilogues at lower levels - moving / transposing output data through shared memory. This creates an opportunity to increase productive overlaps with subsequent kernels. Using launch_dependents instructions requires disabling non-coherent / invariant loads for dependent data in successors being launched as these can not be guarded by their PDL waits. 🎯 Justification Speeds up (typically inference) benchmarks by overlapping independent phases of kernels. 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) Measured on H100 using CUDA events: | Benchmark | Speedup | Error | |-----------------------------------------------------|---------|-------| | gemma2_2b_keras_jax | 1.01x | 0.00x | | gemma3_1b_flax_call | 1.01x | 0.00x | | gemma3_1b_flax_sample_loop | 1.00x | 0.00x | | gemma3_4b_flax_call | 1.01x | 0.00x | | gemma3_4b_flax_sample_loop | 1.01x | 0.00x | | gemma3_12b_flax_call | 1.01x | 0.00x | | gemma3_12b_flax_sample_loop | 1.00x | 0.00x | | gpu_hlo | 1.00x | 0.01x | | hlo_gemma4_2b_bf16 | 1.00x | 0.00x | | hlo_llama31_8b_bf16_1x8 | 1.00x | 0.01x | | hlo_llama31_8b_fp8_1x8 | 1.01x | 0.05x | | hlo_mixtral_8x7b_bf16_1x8 | 1.00x | 0.08x | | nv_maxtext_1n1g_jit_train_step_before_optimization | 1.00x | 0.00x | | u4_all_gather_1x8 | 0.99x | 0.03x | 🧪 Unit Tests: yes 🧪 Execution Tests: no Copybara import of the project: -- 711244dcbece02d56400c097adb5419f6856da52 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Add PDL launch instruction insertion. Merging this change closes #43894 PiperOrigin-RevId: 939804756

PiperOrigin-RevId: 939806406

…one file Pure refactoring step with no behavior or content modifications. This will make it easier to review further changes to the HTML content. PiperOrigin-RevId: 939807152

PiperOrigin-RevId: 939821154

This will let Fusion Explorer expose more information about the choices made by PriorityFusion. PiperOrigin-RevId: 939822742

Document the HLO parser support for desugaring suffix-based async operations (e.g., dot-start, dot-update, dot-done) and the variadic nature of async-update. PiperOrigin-RevId: 939897472

Imported from GitHub PR openxla/xla#44389 📝 Summary of Changes Enables float and buffer xor checker thunks on ROCm 🎯 Justification Enables more debugging tools on ROCm platform 🚀 Kind of Contribution ✨ New Feature 📊 Benchmark (for Performance Improvements) N\A 🧪 Unit Tests: None 🧪 Execution Tests: Moved cuda specific tests into: //xla/stream_executor/gpu:buffer_debug_float_check_kernel_test //xla/stream_executor/gpu:buffer_debug_xor_checksum_kernel_test Copybara import of the project: -- 275390bf545b29b177820195ec4a209d77310093 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Enable float and buffer checker Merging this change closes #44389 Manual patches: - Rename `*_lib.cu.h` headers to `*_lib.cu.h.inc`. Those headers need to be included after defining a platform-specific definition of `kWarpSize` constant, and don't work without it. However, putting them in `srcs` makes some internal test builds attempt to compile the header by itself - which fails because it's not self-sufficient. - Add missing `compatible_with = "//buildenv/target:non_prod"`. PiperOrigin-RevId: 939902012

PiperOrigin-RevId: 939920098

…lysis. The include and build dependency for tsl/platform/errors are not used in hlo_dataflow_analysis. PiperOrigin-RevId: 939932250

PiperOrigin-RevId: 939946290

…egate PiperOrigin-RevId: 939950090

sergachev and others added 11 commits June 29, 2026 06:55

[XLA:GPU]: Remove reduction_kind from collective_kernel_thunk

4bb7de8

PiperOrigin-RevId: 939806406

[XLA:GPU] Extract embedded Fusion Explorer HTML template to a standal…

cd657a6

…one file Pure refactoring step with no behavior or content modifications. This will make it easier to review further changes to the HTML content. PiperOrigin-RevId: 939807152

Automated Code Change

856cdf0

PiperOrigin-RevId: 939821154

[XLA:GPU] PriorityFusion: register fusion ineligibility reasons

9451ebb

This will let Fusion Explorer expose more information about the choices made by PriorityFusion. PiperOrigin-RevId: 939822742

[XLA] Update documentation for async operations.

b8a24f5

Document the HLO parser support for desugaring suffix-based async operations (e.g., dot-start, dot-update, dot-done) and the variadic nature of async-update. PiperOrigin-RevId: 939897472

[XLA][HLO][Verifier] Refactor shape verification for async instructions

79e5c52

PiperOrigin-RevId: 939920098

Remove unused dependency on tsl/platform/errors from hlo_dataflow_ana…

e622e92

…lysis. The include and build dependency for tsl/platform/errors are not used in hlo_dataflow_analysis. PiperOrigin-RevId: 939932250

Fix various OOB access across TensorFlow.

85913ff

PiperOrigin-RevId: 939946290

Add support for quantize, dequantize, more activations to YNNPACK del…

705879d

…egate PiperOrigin-RevId: 939950090

pull Bot locked and limited conversation to collaborators Jun 29, 2026

pull Bot added the ⤵️ pull label Jun 29, 2026

pull Bot merged commit 705879d into GesuBackups:master Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from tensorflow:master#1981

[pull] master from tensorflow:master#1981
pull[bot] merged 11 commits into
GesuBackups:masterfrom
tensorflow:master

pull Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

pull Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pull Bot commented Jun 29, 2026 •

edited

Loading