[pull] master from tensorflow:master by pull[bot] · Pull Request #1683 · GesuBackups/tensorflow

pull · 2026-04-01T13:29:22Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

PiperOrigin-RevId: 892765437

This makes sure that instances of GpuTopology are always created with a target config attached in StreamExecutorGpuPjRtClient. It also makes sure that these fields are transferred through the C API layer via PjRt attributes. This change is a requirement for the unified compilation path of AOT and JIT compilation. PiperOrigin-RevId: 892773575

PiperOrigin-RevId: 892776705

…odule execution on devices is completed. PiperOrigin-RevId: 892786341

…L memfill/memset tests Imported from GitHub PR openxla/xla#40180 This PR contains the following changes: 1. Redundant context activation code is removed since SYCL contexts do not require explicit activation. 2. SYCL memfill and memset functions (both sync and async versions) expect count (i.e. number of elements) instead of bytes when filling device memory. The corresponding tests have been updated. Copybara import of the project: -- 640db293e76b5b0af3d13e655282394605d42d08 by Bhavani Subramanian <bhavani1.subramanian@intel.com>: Use count instead of bytes in SyclMemsetDevice(Async) and SyclMemfillDevice(Async) tests -- 4148115bf39d310722da5de4c2185af174c9d08d by Bhavani Subramanian <bhavani1.subramanian@intel.com>: Remove context activation from 'stream_executor/sycl' code since it is not used Merging this change closes #40180 PiperOrigin-RevId: 892786832

gpu_device_info_test is mainly testing whether the embedded target configs are up to date, therefore the test should live with the target config logic. PiperOrigin-RevId: 892787083

…yout calculation. The scan rewriter now correctly identifies zero initial values even when they are wrapped in a broadcast. Additionally, the logic for calculating `vector_length` and `column_length` based on the minor-to-major layout has been corrected. PiperOrigin-RevId: 892813374

…source. Imported from GitHub PR openxla/xla#33269 📝 Summary of Changes - Addin a knob to control the limitation of async-compute resource. This switch provides ample flexibility for control, enabling more asynchronous computations to execute concurrently. In host-offloading experiments, increasing this value effectively overlaps device-to-host (D2H) transfers with other computations, resulting in improved performance. 🎯 Justification This could allow users to control how many in-fligh async-computation in LHS. 🚀 Kind of Contribution ✨ New Feature, Copybara import of the project: -- b84bbf195006686ee54b0e5ed791f8d8d024d1eb by Ming Huang <mingh@nvidia.com>: Add a flag to control the limitation of compute resource. -- 308f3be1073dc00e9df54a16718b82bbc9361e45 by Ming Huang <mingh@nvidia.com>: Adding a comment to xla_gpu_experimental_parallel_async_compute_limit Merging this change closes #33269 PiperOrigin-RevId: 892814843

PiperOrigin-RevId: 892818671

…args Imported from GitHub PR openxla/xla#39601 Instead of hardcoding NCCL device comm size for passing it to device kernels add support for dynamically sized storage for packed arguments to SE Kernel API. NCCL device comm size keep changing from release to release, and instead of breaking XLA on every update it should be dynamic. Copybara import of the project: -- 227faa25ba5b670f22fd2f6ed67041eba66d1e69 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:gpu] Add support for dynamically sized packed kernel args Merging this change closes #39601 PiperOrigin-RevId: 892823919

PiperOrigin-RevId: 892829493

In this part: drop using xla utility from xla shardy and instead use the one from mlir shardy. Pure refactoring. As part of preparing to move the in/outliner into shardy, that is, from xla::sdy into mlir::sdy. Because xla::sdy and mlir::sdy are separately managed repos, this move can not be in one go. PiperOrigin-RevId: 892829743

Imported from GitHub PR openxla/xla#39854 📝 Summary of Changes Use hermetic llvm dependency to compile xla under rocm 🎯 Justification Get rid of the dependency of the installed llvm, instead use hermetic llvm and provided with it clang to compile xla 🚀 Kind of Contribution Please remove what does not apply: ✨ New Feature 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: CI checks 🧪 Execution Tests: CI checks Copybara import of the project: -- 70595dbf5a8ee623ac581bc10a4a57af36402336 by Alexandros Theodoridis <atheodor@amd.com>: Use hermetic clang for rocm -- fc313ba69715587709493014a564fcc0f8f4975f by Alexandros Theodoridis <atheodor@amd.com>: Fix jax build -- 5ab1657b439f92d226e2cbe491145f72fbf0b9a8 by Alexandros Theodoridis <atheodor@amd.com>: Trigger CI/CD pipeline -- 188acde374789e7bfefd31951acbc3574fa9ec24 by Alexandros Theodoridis <atheodor@amd.com>: Switch to use ml toolchain repo for local_clang Merging this change closes #39854 PiperOrigin-RevId: 892840096

Codegen checks are currently spread across 3 files. This change unifies it one place as much as possible. PiperOrigin-RevId: 892845059

…fer scheduling mode `CONCURRENT_REGIONS`. PiperOrigin-RevId: 892857796

This is needed to correctly emit ops with the runtime variables, e.g. dynamic slice. If we have dyn-slice(parameter) and we want to emit the load from the parameter, then the RT variable should have been emitted before, because it is a part of the offset computation for the load. PiperOrigin-RevId: 892860392

PiperOrigin-RevId: 892863362

…ided. PiperOrigin-RevId: 892867437

PiperOrigin-RevId: 892871265

Imported from GitHub PR openxla/xla#39725 There is no need to create hang watchdogs for different part of XLA, just one per-process instance is enough for most of the use cases. Copybara import of the project: -- 8109d40d86c137bb54566317c8c187e025be2ddc by Eugene Zhulenev <ezhulenev@openxla.org>: [xla] Add global per-process hang watchdog Merging this change closes #39725 PiperOrigin-RevId: 892873909

PiperOrigin-RevId: 892892516

tensorflower-gardener and others added 21 commits April 1, 2026 00:43

Automated Code Change

c84dd45

PiperOrigin-RevId: 892765437

Automated Code Change

eb82606

PiperOrigin-RevId: 892776705

[XLA:GPU] Prevent collective memory from being destructed until the m…

528828c

…odule execution on devices is completed. PiperOrigin-RevId: 892786341

Move gpu_device_info_test to target_config directory.

4c8cdd0

gpu_device_info_test is mainly testing whether the embedded target configs are up to date, therefore the test should live with the target config logic. PiperOrigin-RevId: 892787083

Automated Code Change

614e49d

PiperOrigin-RevId: 892818671

Automated Code Change

a885b31

PiperOrigin-RevId: 892829493

[XLA:GPU]: Unify collective codegen checks in one place.

c81cd43

Codegen checks are currently spread across 3 files. This change unifies it one place as much as possible. PiperOrigin-RevId: 892845059

[XLA:GPU] Actually use ConcurrentRegionsHloOrdering for command buf…

1b47788

…fer scheduling mode `CONCURRENT_REGIONS`. PiperOrigin-RevId: 892857796

Run shardy/google/integrate_latest.sh to integrate.

17a6cf8

PiperOrigin-RevId: 892863362

[XLA:GPU] Remove NCCL from dependencies if no_nccl definition is prov…

0fc4edc

…ided. PiperOrigin-RevId: 892867437

Use utility to walk on call graphs.

faf37bf

PiperOrigin-RevId: 892871265

[XLA:GPU] Mark NVSHMEM thunks as deprecated.

e65ba7e

PiperOrigin-RevId: 892892516

pull Bot locked and limited conversation to collaborators Apr 1, 2026

pull Bot added the ⤵️ pull label Apr 1, 2026

pull Bot merged commit e65ba7e into GesuBackups:master Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from tensorflow:master#1683

[pull] master from tensorflow:master#1683
pull[bot] merged 21 commits into
GesuBackups:masterfrom
tensorflow:master

pull Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Uh oh!

Conversation

pull Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

pull Bot commented Apr 1, 2026 •

edited

Loading