Skip to content

[pull] master from tensorflow:master#1687

Merged
pull[bot] merged 23 commits into
GesuBackups:masterfrom
tensorflow:master
Apr 2, 2026
Merged

[pull] master from tensorflow:master#1687
pull[bot] merged 23 commits into
GesuBackups:masterfrom
tensorflow:master

Conversation

@pull

@pull pull Bot commented Apr 2, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

vamshikiran065-jpg and others added 23 commits March 13, 2026 13:27
Corrected minor grammatical issues and formatting in installation instructions.
In CustomCallThunk, when `cpu_target_machine_options` is not provided, pass a pointer to a default-constructed `xla::cpu::TargetMachineOptions` instead of `nullptr`. It autodetects the current host triple and cpu name.

PiperOrigin-RevId: 893340005
…de-registration until the module execution is completed.

Also added delayed test-cases for both symmetric memory and peer parameters cases.

PiperOrigin-RevId: 893340921
… MIOpen autotuning

Imported from GitHub PR openxla/xla#39622

📝 Summary of Changes
Pass down device_allocator from gpu_compiler into miopen backend instead constructing one when getting algorithms.

🎯 Justification
Allows miopen backend to allocate larger scratch buffers due to not being limited by amount of free memory
that is not being reserved by BFCAllocator.

🚀 Kind of Contribution
🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
N\A

🧪 Unit Tests:
N\A

🧪 Execution Tests:
N\A

Copybara import of the project:

--
0113da6439b85f745d8114d9377c25ade4cea2e1 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>:

[ROCm] Use BFCAllocator for scratch allocations needed for MIOpen autotuning

Merging this change closes #39622

PiperOrigin-RevId: 893361718
Imported from GitHub PR openxla/xla#39843

Bumps [jwalton/gh-find-current-pr](https://github.com/jwalton/gh-find-current-pr) from 1.3.3 to 1.3.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/jwalton/gh-find-current-pr/releases">jwalton/gh-find-current-pr's releases</a>.</em></p>
<blockquote>
<h2>v1.3.5</h2>
<h2><a href="https://github.com/jwalton/gh-find-current-pr/compare/v1.3.4...v1.3.5">1.3.5</a> (2026-03-15)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump to node24. (<a href="https://github.com/jwalton/gh-find-current-pr/commit/db0c647679ec9fd2ff8950ac8b66be6d579d17d1">db0c647</a>)</li>
</ul>
<h2>v1.3.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Update action to use Node.js 22 by <a href="https://github.com/larshp"><code>@​larshp</code></a> in <a href="https://redirect.github.com/jwalton/gh-find-current-pr/pull/120">jwalton/gh-find-current-pr#120</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/larshp"><code>@​larshp</code></a> made their first contribution in <a href="https://redirect.github.com/jwalton/gh-find-current-pr/pull/120">jwalton/gh-find-current-pr#120</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/jwalton/gh-find-current-pr/compare/v1...v1.3.4">https://github.com/jwalton/gh-find-current-pr/compare/v1...v1.3.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/jwalton/gh-find-current-pr/commit/f3d61b485d2801773f7a07b2aaa3306bd8f8e653"><code>f3d61b4</code></a> chore(release): 1.3.5 [skip ci]</li>
<li><a href="https://github.com/jwalton/gh-find-current-pr/commit/db0c647679ec9fd2ff8950ac8b66be6d579d17d1"><code>db0c647</code></a> fix: Bump to node24.</li>
<li><a href="https://github.com/jwalton/gh-find-current-pr/commit/6aa931781d174b648d6cc7070eb46d1fa4d29927"><code>6aa9317</code></a> Merge pull request <a href="https://redirect.github.com/jwalton/gh-find-current-pr/issues/120">#120</a> from larshp/patch-1</li>
<li><a href="https://github.com/jwalton/gh-find-current-pr/commit/e2e9ed4a7ba06c4f27e59f1ff7ff1418a550f198"><code>e2e9ed4</code></a> Update action to use Node.js 22</li>
<li>See full diff in <a href="https://github.com/jwalton/gh-find-current-pr/compare/89ee5799558265a1e0e31fab792ebb4ee91c016b...f3d61b485d2801773f7a07b2aaa3306bd8f8e653">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jwalton/gh-find-current-pr&package-manager=github_actions&previous-version=1.3.3&new-version=1.3.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
Copybara import of the project:

--
c730fd8205aa0ffb3c2d8b16daf71090d1e1c685 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump jwalton/gh-find-current-pr from 1.3.3 to 1.3.5

Bumps [jwalton/gh-find-current-pr](https://github.com/jwalton/gh-find-current-pr) from 1.3.3 to 1.3.5.
- [Release notes](https://github.com/jwalton/gh-find-current-pr/releases)
- [Commits](jwalton/gh-find-current-pr@89ee579...f3d61b4)

---
updated-dependencies:
- dependency-name: jwalton/gh-find-current-pr
  dependency-version: 1.3.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #39843

PiperOrigin-RevId: 893377229
Here we are trying to make it as similar as possible to AffineExpr to minimize the number of tests that we need to modify in the migration.

PiperOrigin-RevId: 893379056
Imported from GitHub PR openxla/xla#39951

Add missing collective op.
Copybara import of the project:

--
8a92831cebf93b08e2cec11bb5dcbe6bb5e6a755 by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Add missing scheduling for all-gather-start

Merging this change closes #39951

PiperOrigin-RevId: 893383177
Failing tests were fixed in (cl/892275092). Tests are passing now.

Original message:
Partial migration of IndexingMap::GetAffineMap. Still some users are there to be migrated, but I do not want to make a giant CL.

The changes include:
- Updating call sites in XLA GPU backend emitters (reduction, scatter, transpose)
- Updating HWIR's hlo_expansion.cc to use SymbolicMap.
- Updating various files in /emitters...

Reverts 00af54d

PiperOrigin-RevId: 893384506
…t of splitk) into one when rewriting dot to cuBLAS.

The CanCublasHandleGemm() already returned true for such dots even though they are not supported.

Also remove normalization logic from the SplitK rewriter as Triton emitter handles it fine.

PiperOrigin-RevId: 893406725
Partial migration of the deprecated IndexingMap::GetAffineMap and IndexingMap constructors. This CL replaces the MLIR AffineMap and AffineExpr types with custom XLA types SymbolicMap and SymbolicExpr within the XLA codegen attrs, ops and transformation pasess.

PiperOrigin-RevId: 893411269
When a user calls into the compiler, there are 2 ways to provide the CPU target architecture - either through a field in `Compiler::CompileOptions` or through a field in `GpuTopology` (which is the new way).

The current logic has only been looking into `Compiler::CompileOptions`, so this change unifies all usages under `GpuTopology` - the new way.

1. It adjusts InferGpuTopology to take the CPU target config from CompileOptions also into account.
2. Makes the legacy AOT flow use InferGpuTopology
3. Hands down GpuTopology into CompileBackendResult so that the target config can be accessed from there.
4. Also gives the target config to the host offloading XLA:CPU compilation call.
5. Adds an integration test that ensures the target compile options reach the CustomCall FFI Instantiate handler.

PiperOrigin-RevId: 893417109
These are transitional/forwarding headers that should be automatically removed when code is refactored.

PiperOrigin-RevId: 893421594
Imported from GitHub PR openxla/xla#40252

Reduce the number of global thread pools in JAX/XLA by using global hang watchdog.
Copybara import of the project:

--
93244f9fad533f5f97f484f88730580a8f04c440 by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Use Global HangWatchdog in se_gpu_pjrt_client

Merging this change closes #40252

PiperOrigin-RevId: 893431466
HloVerifier cannot assume anything about the provided HLO and should validate
it instead of crashing.
Fix the same kind of bug for ReduceScatter.

PiperOrigin-RevId: 893432458
…uring tiling.

During emission we need to map the affine symbols that correspond to the RT vars to the TiledHloInstructions and then to the emitted TensorValues for them. We can accumulate the tiled HLOs for the RT vars when assembling the TiledHloComputation.

PiperOrigin-RevId: 893441229
It is not needed, the test passes well within 5 minutes on all backends.

PiperOrigin-RevId: 893442970
The `CompileOptions` parameter was not used in the implementations of `CompileTargetBinary` in `AMDGPUCompiler` and `NVPTXCompiler`. This change removes the parameter from the method signature in `GpuCompiler` and its subclasses.

PiperOrigin-RevId: 893446781
This change refactors indexing_analysis and related emitters to use xla::SymbolicExpr and xla::SymbolicMap instead of mlir::AffineExpr and mlir::AffineMap

Migrating indexing_analysis required cascading updates across multiple files, representing the minimal change required to completely remove all Affine references from indexing_analysis.

Key changes:
- Symbolic Representation: Replaced AffineMap and AffineExpr with SymbolicMap and SymbolicExpr across CPU and GPU emitters (including fusion, scatter, transpose, and reduction), tiling schedules, and stablehlo indexing analysis.
- API Refactoring: Migrated MLIR factory calls (e.g., getAffineDimExpr, getAffineConstantExpr) to XLA's symbolic factory functions (e.g., CreateDimExpr, CreateSymbolicConstant, CreateSymbolExpr, SymbolicMap::Get). I have many headaches migrating CreateSymbolExpr.
- Operation Syntax: Replaced explicit .floorDiv() calls with the overloaded / operator for SymbolicExpr evaluation, and updated dimension replacements to use ReplaceDims (more headaches).

Note: This change makes several tests to fail but they were already addressed in previously reviewed CLs (cl/884921505, cl/884981099, cl/885025521 and many more)

Benchmark: Everything seems flat towards positive (gpaste/5804159988269056):
Device Time (denoised)                           1.00x (very close to 1.01x - 1.004x)
Device Time for XLA Codegened / Library Kernels (denoised)  1.01x
Total Compile Time                               1.00x (mostly over 1.00 - 1.002x)
PiperOrigin-RevId: 893452848
… HLO verifier

Imported from GitHub PR openxla/xla#40232

## Summary
Fixes #40191

`VerifyAsynchronousInstructionPairs` checks that async Start/Done instructions
are properly connected for all async op types except `kAllGatherStart/kAllGatherDone`.
This adds the missing verification, following the same pattern used for `kAllReduceStart/kAllReduceDone`.

## Changes
- `hlo_verifier.cc`: Add `kAllGatherStart` and `kAllGatherDone` cases to the async pair verification switch
- `hlo_verifier_test.cc`: Add tests for valid AllGather Start/Done pair and invalid multiple-Done case

## Test plan
- [ ] CI verification
- [x] Tests follow existing AllReduce verifier test patterns
- [x] Follow Google C++ style
Copybara import of the project:

--
30b3bd3e40e9fbd25f5336febefe6f06fa6e0346 by Manish Reddy <kreddy.manish@gmail.com>:

Add missing kAllGatherStart/kAllGatherDone verification to HLO verifier.

Fixes #40191

--
8a46e9c7312dad8f8f47e52c9b1548456cee934f by Manish Reddy <kreddy.manish@gmail.com>:

Use ParseAndReturnVerifiedModule for valid AllGather test case.

Merging this change closes #40232

PiperOrigin-RevId: 893456297
PiperOrigin-RevId: 893465860
…ipblaslt

Imported from GitHub PR openxla/xla#39373

📝 Summary of Changes
Prevent usage of hipblaslt for gemms with bf16 or f16 with f32

🎯 Justification
hipblaslt custom call was generated altough not supported on mi200

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix, 🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
Added CheckCustomCallHipblasLtBF16 test into gemm_rewriter_test

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
7884a994ece5c761e62b1363a8be73680502a29a by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fix issue with unsupported types combinations for hipblaslt

--
8c6ec922e141ac1ee0e46b6cf65ecdcb06c97011 by Zoran Jovanovic <zjovanov@amd.com>:

Skip CheckCustomCallHipblasLtBF16 test for non ROCm archs.

Merging this change closes #39373

PiperOrigin-RevId: 893469347
@pull pull Bot locked and limited conversation to collaborators Apr 2, 2026
@pull pull Bot added the ⤵️ pull label Apr 2, 2026
@pull pull Bot merged commit 45f557d into GesuBackups:master Apr 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.