Skip to content

[pull] master from tensorflow:master#1679

Merged
pull[bot] merged 23 commits into
GesuBackups:masterfrom
tensorflow:master
Mar 31, 2026
Merged

[pull] master from tensorflow:master#1679
pull[bot] merged 23 commits into
GesuBackups:masterfrom
tensorflow:master

Conversation

@pull

@pull pull Bot commented Mar 31, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

vwbaker and others added 23 commits March 31, 2026 00:48
…w emitter can handle this, so there's no reason to restrict this

PiperOrigin-RevId: 892161564
PiperOrigin-RevId: 892161604
There was support added to the xl_test macro for the RTX6000PRO GPU which has SM architecture 12.0, but the build script wasn't updated, leading to all tests being run twice.

This change makes the CI build script aware of the architecture so that it automatically generates the right tag filters.

PiperOrigin-RevId: 892172240
…latform dep

Imported from GitHub PR openxla/xla#39693

📝 Summary of Changes
- Add `//xla/stream_executor/cuda:cuda_platform` dependency to `tracked_gpu_device_buffer_test`
- The test was failing with `RET_CHECK failure: !platforms.empty() No platforms found` because no GPU platform was registered

🎯 Justification
We successfully run many `no_oss`-tagged GPU tests and are working to separate the ones that genuinely cannot work in OSS from those that simply need minor fixes. The `no_oss` tag means "do not execute in public openxla/xla CI" but does not imply the test *cannot* work outside of it. This test only needed a missing platform dependency.

Copybara import of the project:

--
bda304cbb7cb8651b5778b818ff8458aefca87d6 by Andrei Ivanov <anivanov@nvidia.com>:

Skip GpuDeviceMemoryTest:AsShapeBuffer in OSS

--
5a309f2891a5494e689aa4a42a979f16eb4960c0 by Andrei Ivanov <anivanov@nvidia.com>:

Remove local_client dependency from tracked_gpu_device_buffer_test

--
3a203b572b2cf87c8fa372a1f396852e4c62d630 by Andrei Ivanov <anivanov@nvidia.com>:

Bring GPU dependency back to tracked_gpu_device_buffer_test

--
af75602cee892bec8b94ef1a57001d40c6fd36a4 by Andrei Ivanov <anivanov@nvidia.com>:

Remove no_oss and add cuda-only tags to tracked_gpu_device_buffer_test

--
c451e15ee710365130b2a020d5575a69c1661a6f by Andrei Ivanov <anivanov@nvidia.com>:

Bring no_oss tag back

Merging this change closes #39693

PiperOrigin-RevId: 892172401
Imported from GitHub PR openxla/xla#39393

  Introduces VA (virtual address) remapping in GpuExecutable to allow command buffer thunks to use stable virtual addresses across executions, enabling command buffers to be recorded once and replayed without per-run updates when buffer allocations change physical addresses.

  Key components:

  - xla_gpu_enable_command_buffer_va_remapping flag (field 462) to opt in to the VA remapping path.
  - GpuExecutable::ExecuteThunksWithVaRemapping: reserves a single contiguous VA range covering all command-buffer allocations on first use, maps physical memory into it via MemoryReservation::MapTo before each execution, and unmaps asynchronously via a GPU event after the kernels complete.
  - CommandBufferThunk gains an allocs_indices() accessor and an enable_command_buffer_va_remapping_ flag. When the flag is set, Initialize and ExecuteOnStream skip re-recording the command buffer after the first record (addresses are stable from the VA range's perspective).
  - DeviceAddressVmmAllocator gets two new public methods — GetAllocationGranularity(StreamExecutor*) and CreateReservation(StreamExecutor*, uint64_t) (moved from protected) — so GpuExecutable can perform VA operations without casting to a CUDA-specific type.

  Design: one VaRanges entry per executor (stored in module_va_ranges_); btree_set is used for deterministic offset assignment across allocations.

  🎯 Justification

  Command buffers currently must be re-recorded every time buffer addresses change, which adds CPU overhead on each execution. With VA remapping, command buffers see the same fixed virtual addresses on every run — the underlying physical pages are swapped in/out around each execution instead. This makes record-once / replay-many command buffers practical without requiring the allocator to guarantee stable physical
  addresses.

  The feature is gated behind a flag and has no effect when the allocator is not a DeviceAddressVmmAllocator, so it is fully backward compatible.

  🚀 Kind of Contribution

  ⚡️ Performance Improvement, ✨ New Feature

  📊 Benchmark (for Performance Improvements)

  Benchmark results pending — to be provided by workload owners who enable the flag in their pipelines.

  🧪 Unit Tests

  No new unit tests in this change. The existing CommandBufferThunk unit tests cover the standard (non-VA-remapping) path. Unit tests for the VA remapping path (verifying single-record behavior and correct VA-offset assignment) should be added as a follow-up.

  🧪 Execution Tests

  New end-to-end PjRt client tests in se_gpu_pjrt_client_test.cc cover the VA remapping path across representative operation types:

  - CommandBufferVaRemappingCudnnConv — cuDNN convolution
  - CommandBufferVaRemappingDynamicSliceFusion — dynamic-slice fusion
  - CommandBufferVaRemappingWhileLoop — while-loop with command buffers
  - CommandBufferVaRemappingMultiplexing — multiple command buffer thunks sharing the same VA range
  - CommandBufferVaRemappingGemmOps — GEMM operations
  - CommandBufferVaRemappingConditional — conditional (if/else) command buffers
  - CommandBufferVaRemappingFusionOps — element-wise fusion ops

Copybara import of the project:

--
b6df090d193221c61272ff96e988e7c1eafabe00 by Shawn Wang <shawnw@nvidia.com>:

[xla:gpu] Add VA remapping for command buffer thunks

Adds support for virtual address (VA) remapping in GpuExecutable to
allow command buffer thunks to use fixed virtual addresses across
executions, enabling command buffers to be recorded once and replayed
without updates when buffer allocations change physical addresses.

Key components:
- xla_gpu_enable_command_buffer_va_remapping flag (field 462) to
  opt in to the VA remapping path.
- GpuExecutable::ExecuteThunksWithVaRemapping: reserves a single
  contiguous VA range for all command-buffer allocations, maps
  physical memory into it before each execution via
  CudaMemoryReservation::MapTo, and unmaps asynchronously via an
  event after the GPU finishes.
- CommandBufferThunk gains allocs_indices() accessor and
  enable_command_buffer_va_remapping_ flag; Initialize/ExecuteOnStream
  skip re-recording when VA remapping is active (addresses are stable
  from the VA range's perspective).

Design: one VaRange per executor (no per-index interleaving);
uses existing CudaExecutor::GetVmmGranularity() and
DeviceAddressVmmAllocator from vmm_device_address_allocator.h.

--
2dbfb9df6a92518b0b3a45aff26a75de081b6973 by Shawn Wang <shawnw@nvidia.com>:

clang-format fixes for VA remapping changes

--
c6af9a908ca19ef4763bad4a6c4d2e7543b4538f by Shawn Wang <shawnw@nvidia.com>:

add BUILD dependency

--
72fc5fc7f654abd5eb567c540dc70c690b4bc7d6 by Shawn Wang <shawnw@nvidia.com>:

fix btree_set dep name

--
922bb8686cda4a720d110146a2b9e3bb5094d5cf by Shawn Wang <shawnw@nvidia.com>:

increase xla.proto id

--
32291909cd8329819d9b0267c96274bc403fbcbe by Shawn Wang <shawnw@nvidia.com>:

fix the tests

Merging this change closes #39393

PiperOrigin-RevId: 892174660
Imported from GitHub PR openxla/xla#39695

📝 Summary of Changes
  - Add missing `cuda_platform` dep so the GPU platform is registered
  - Disambiguate `LocalExecutable::Run()` calls with explicit `absl::Span<const ShapedBuffer* const>` (fixes build: overload resolution was ambiguous in
  OSS)
  - Gracefully skip tests when the GPU's compute capability doesn't match the AOT-compiled binary instead of failing with an opaque error

🎯 Justification
We successfully run many `no_oss`-tagged GPU tests and are working to separate the ones that genuinely cannot work in OSS from those that simply need minor fixes. The `no_oss` tag means "do not execute in public openxla/xla CI" but does not imply the test *cannot* work outside of it. This test needed a build fix, a missing dependency, and a skip for compute-capability mismatches (the AOT artifacts are compiled for a specific GPU architecture).

Copybara import of the project:

--
334bc4eab96f5c124c48fb6b4e74e4c1a58553fe by Andrei Ivanov <anivanov@nvidia.com>:

Skip XlaAotCompileTest in OSS

--
f14420164537c015c3c84ce8bfd13ec62aa851b0 by Andrei Ivanov <anivanov@nvidia.com>:

Skip xla_aot_compile_gpu_test on incompatible CC

--
5b421e4b66a67afd04449869fcd886ce7b4e558e by Andrei Ivanov <anivanov@nvidia.com>:

Cleaner span initialization

--
ead326ce6ea2615198e1b4e3b66c38f6bfed2c59 by Andrei Ivanov <anivanov@nvidia.com>:

Skip xla_aot_compile_gpu_test on unmatched compute capability

--
c958a6705e8a3bf267798b1727ec6e01961ecc85 by Andrei Ivanov <anivanov@nvidia.com>:

format xla_aot_compile_gpu_test.cc

--
ed4b7f2cb823f39cca6a9659ccf4252b39ae1c95 by Andrei Ivanov <anivanov@nvidia.com>:

Change xla_aot_compile_gpu_test from xla_cc_test to xla_test, add backend specification, remove cuda_platform dependency

Merging this change closes #39695

PiperOrigin-RevId: 892183319
…threshold as cpu_compiler.

PiperOrigin-RevId: 892197032
…rmute

Imported from GitHub PR openxla/xla#40141

`hlo_async_executions_` updated below for sync/async execution. Adding nullptr to the map is not correct, because degenerate copies become async copies with their own async execution tracking.

Fix for an error: `Async execution already exists for instruction collective-permute`
Copybara import of the project:

--
a22f0b7766275dd235b2cfea784ff86ea355998c by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Fix a bug when emitting degenerate collective permute

Merging this change closes #40141

PiperOrigin-RevId: 892197975
Imported from GitHub PR openxla/xla#40108

Make top-level protofuf version consistent with `single_version_override` at line 72
Copybara import of the project:

--
bd22d832c9cd9966744b0dee5e0927bd6413b2e9 by Eugene Zhulenev <ezhulenev@openxla.org>:

Update protobuf dependency version to 32.1

Make top-level protofuf version consistent with `single_version_override` at line 72

Merging this change closes #40108

PiperOrigin-RevId: 892206231
Imported from GitHub PR openxla/xla#39871

📝 Summary of Changes
arguments always up/down casted for bf16 - there are no native libdevice functions for bf16

🎯 Justification
with current implementation up/down cast are omitted for bf16, which is not correct

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix 🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
Added new test:
//xla/backends/gpu/codegen/triton/transforms/tests:triton_xla_math_to_libdevice_rocm.mlir

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
b197577ada00832c7c9407166e657ff9b0a742be by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Fix bf16 upcast handling for libdevice calls.

--
e53701897989178f86eb173eb08ad943414e139a by Zoran Jovanovic <zjovanov@amd.com>:

Review comments.

--
76e617e14d59007069f12ab26a4c6bd703ebaa6a by Zoran Jovanovic <zjovanov@amd.com>:

Rmoved unnecessary file inclusion.

Merging this change closes #39871

PiperOrigin-RevId: 892211779
Imported from GitHub PR openxla/xla#39309

📝 Summary of Changes
- Capture scope_range_id from AnnotationStack during HIP API correlation callbacks and propagate it through AnnotationMap to GPU kernel events in the profiler trace
- Build the scope_range_id tree (child-to-parent mapping) from the full range ID sequence in AnnotationMap::Add, matching the CUPTI AddScopeRangeIdSequence pattern
- Export the tree as a separate XPlane so xprof can reconstruct the scope hierarchy for derived timeline grouping
- Set scope_range_id on all event types (kernel, HIP API, memcpy) for uniform coverage, matching the CUPTI path

🚀 Kind of Contribution
Please remove what does not apply:✨ New Feature
Copybara import of the project:

--
57413350c08338efcf90c9eb9293a184890bf440 by magaonka <magaonka@amd.com>:

[ROCm] Add scope_range_id support to ROCm profiler

- Capture scope_range_id from AnnotationStack during HIP API
  correlation callbacks and propagate it through AnnotationMap
  to GPU kernel events in the profiler trace
- Build the scope_range_id tree (child-to-parent mapping) from
  the full range ID sequence in AnnotationMap::Add, matching the
  CUPTI AddScopeRangeIdSequence pattern
- Export the tree as a separate XPlane so xprof can reconstruct
  the scope hierarchy for derived timeline grouping
- Set scope_range_id on all event types (kernel, HIP API, memcpy)
  for uniform coverage, matching the CUPTI path

--
2b357975756411b96f6f390d91df893455097ae6 by Manjunath Gaonkar <magaonka@amd.com>:

[ROCm] Fix build dependency issues

- Add @com_google_absl//absl/types:span to rocm_tracer deps
- Fix DWYU violations across profiler/gpu BUILD targets

--
ed62530499e1c77d9be95054bbb0b3193798615a by magaonka <magaonka@amd.com>:

[ROCm] Fix build_cleaner findings in profiler GPU BUILD

- Add absl/types:span to rocm_tracer_utils (used in header)
- Remove unused absl/memory and //xla/tsl/platform:logging from cupti_utils
- Remove unused //xla/tsl/platform:test from cupti_buffer_events_test

Merging this change closes #39309

PiperOrigin-RevId: 892216477
Imported from GitHub PR openxla/xla#39991

## Summary
Fixes #39930

- Add test for `BatchNormInference` expansion (previously had zero test coverage)
- Add test for `BatchNormInference` with sharding propagation
- Add test for `BatchNormGrad` with sharding propagation (training had it, grad didn't)
- Add test for disabled rewrite flags (verify pass is a no-op when all flags are false)

## Test plan
- [ ] CI verification (unable to build full CPU backend locally — ARM Mac under x86 emulation)
- [x] Tests follow existing patterns in the file (`HloPjRtTestBase`)
- [x] Follow Google C++ style
Copybara import of the project:

--
9002d51b7896293eb4e3ebcca80665c55384441d by Manish Reddy <kreddy.manish@gmail.com>:

Add test coverage for BatchNormExpander untested code paths.

Fixes #39930

Merging this change closes #39991

PiperOrigin-RevId: 892231772
…enchmarks/e2e/gemma2/flax_2b

Imported from GitHub PR openxla/xla#40111

Bumps [pygments](https://github.com/pygments/pygments) from 2.18.0 to 2.20.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/pygments/pygments/releases">pygments's releases</a>.</em></p>
<blockquote>
<h2>2.20.0</h2>
<ul>
<li>
<p>New lexers:</p>
<ul>
<li>Rell (<a href="https://redirect.github.com/pygments/pygments/issues/2914">#2914</a>)</li>
</ul>
</li>
<li>
<p>Updated lexers:</p>
<ul>
<li>archetype: Fix catastrophic backtracking in GUID and ID patterns (<a href="https://redirect.github.com/pygments/pygments/issues/3064">#3064</a>)</li>
<li>ASN.1: Recognize minus sign and fix range operator (<a href="https://redirect.github.com/pygments/pygments/issues/3014">#3014</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3060">#3060</a>)</li>
<li>C++: Add C++26 keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2955">#2955</a>), add integer literal suffixes (<a href="https://redirect.github.com/pygments/pygments/issues/2966">#2966</a>)</li>
<li>ComponentPascal: Fix <code>analyse_text</code> (<a href="https://redirect.github.com/pygments/pygments/issues/3028">#3028</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3032">#3032</a>)</li>
<li>Coq renamed to Rocq (<a href="https://redirect.github.com/pygments/pygments/issues/2883">#2883</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2908">#2908</a>)</li>
<li>Cython: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2932">#2932</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2933">#2933</a>)</li>
<li>Debian control: Improve architecture parsing (<a href="https://redirect.github.com/pygments/pygments/issues/3052">#3052</a>)</li>
<li>Devicetree: Add support for overlay/fragments (<a href="https://redirect.github.com/pygments/pygments/issues/3021">#3021</a>), add bytestring support (<a href="https://redirect.github.com/pygments/pygments/issues/3022">#3022</a>), fix catastrophic backtracking (<a href="https://redirect.github.com/pygments/pygments/issues/3057">#3057</a>)</li>
<li>Fennel: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2911">#2911</a>)</li>
<li>Haskell: Handle escape sequences in character literals (<a href="https://redirect.github.com/pygments/pygments/issues/3069">#3069</a>, <a href="https://redirect.github.com/pygments/pygments/issues/1795">#1795</a>)</li>
<li>Java: Add module keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2955">#2955</a>)</li>
<li>Lean4: Add operators <code>]'</code>, <code>]?</code>, <code>]!</code>  (<a href="https://redirect.github.com/pygments/pygments/issues/2946">#2946</a>)</li>
<li>LESS: Support single-line comments (<a href="https://redirect.github.com/pygments/pygments/issues/3005">#3005</a>)</li>
<li>LilyPond: Update to 2.25.29 (<a href="https://redirect.github.com/pygments/pygments/issues/2974">#2974</a>)</li>
<li>LLVM: Support C-style comments (<a href="https://redirect.github.com/pygments/pygments/issues/3023">#3023</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2978">#2978</a>)</li>
<li>Lua(u): Fix catastrophic backtracking (<a href="https://redirect.github.com/pygments/pygments/issues/3047">#3047</a>)</li>
<li>Macaulay2: Update to 1.25.05 (<a href="https://redirect.github.com/pygments/pygments/issues/2893">#2893</a>), 1.25.11 (<a href="https://redirect.github.com/pygments/pygments/issues/2988">#2988</a>)</li>
<li>Mathematica: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2957">#2957</a>)</li>
<li>meson: Add additional operators (<a href="https://redirect.github.com/pygments/pygments/issues/2919">#2919</a>)</li>
<li>MySQL: Update keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2970">#2970</a>)</li>
<li>org-Mode: Support both schedule and deadline (<a href="https://redirect.github.com/pygments/pygments/issues/2899">#2899</a>)</li>
<li>PHP: Add <code>__PROPERTY__</code> magic constant (<a href="https://redirect.github.com/pygments/pygments/issues/2924">#2924</a>), add reserved keywords (<a href="https://redirect.github.com/pygments/pygments/issues/3002">#3002</a>)</li>
<li>PostgreSQL: Add more keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2985">#2985</a>)</li>
<li>protobuf: Fix namespace tokenization (<a href="https://redirect.github.com/pygments/pygments/issues/2929">#2929</a>)</li>
<li>Python: Add <code>t</code>-string support (<a href="https://redirect.github.com/pygments/pygments/issues/2973">#2973</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3009">#3009</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3010">#3010</a>)</li>
<li>Tablegen: Fix infinite loop (<a href="https://redirect.github.com/pygments/pygments/issues/2972">#2972</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2940">#2940</a>)</li>
<li>Tera Term macro: Add commands introduced in v5.3 through v5.6 (<a href="https://redirect.github.com/pygments/pygments/issues/2951">#2951</a>)</li>
<li>TOML: Support TOML 1.1.0 (<a href="https://redirect.github.com/pygments/pygments/issues/3026">#3026</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3027">#3027</a>)</li>
<li>Turtle: Allow empty comment lines (<a href="https://redirect.github.com/pygments/pygments/issues/2980">#2980</a>)</li>
<li>XML: Added <code>.xbrl</code> as file ending (<a href="https://redirect.github.com/pygments/pygments/issues/2890">#2890</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2891">#2891</a>)</li>
</ul>
</li>
<li>
<p>Drop Python 3.8, and add Python 3.14 as a supported version (<a href="https://redirect.github.com/pygments/pygments/issues/2987">#2987</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3012">#3012</a>)</p>
</li>
<li>
<p>Various improvements to <code>autopygmentize</code> (<a href="https://redirect.github.com/pygments/pygments/issues/2894">#2894</a>)</p>
</li>
<li>
<p>Update <code>onedark</code> style to support more token types (<a href="https://redirect.github.com/pygments/pygments/issues/2977">#2977</a>)</p>
</li>
<li>
<p>Update <code>rtt</code> style to support more token types (<a href="https://redirect.github.com/pygments/pygments/issues/2895">#2895</a>)</p>
</li>
<li>
<p>Cache entry points to improve performance (<a href="https://redirect.github.com/pygments/pygments/issues/2979">#2979</a>)</p>
</li>
<li>
<p>Fix <code>xterm-256</code> color table (<a href="https://redirect.github.com/pygments/pygments/issues/3043">#3043</a>)</p>
</li>
<li>
<p>Fix <code>kwargs</code> dictionary getting mutated on each call (<a href="https://redirect.github.com/pygments/pygments/issues/3044">#3044</a>)</p>
</li>
</ul>
<h2>2.19.2</h2>
<ul>
<li>Lua: Fix regression introduced in 2.19.0 (<a href="https://redirect.github.com/pygments/pygments/issues/2882">#2882</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2839">#2839</a>)</li>
</ul>
<h2>2.19.1</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pygments/pygments/blob/master/CHANGES">pygments's changelog</a>.</em></p>
<blockquote>
<h2>Version 2.20.0</h2>
<p>(released March 29th, 2026)</p>
<ul>
<li>
<p>New lexers:</p>
<ul>
<li>Rell (<a href="https://redirect.github.com/pygments/pygments/issues/2914">#2914</a>)</li>
</ul>
</li>
<li>
<p>Updated lexers:</p>
<ul>
<li>archetype: Fix catastrophic backtracking in GUID and ID patterns (<a href="https://redirect.github.com/pygments/pygments/issues/3064">#3064</a>)</li>
<li>ASN.1: Recognize minus sign and fix range operator (<a href="https://redirect.github.com/pygments/pygments/issues/3014">#3014</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3060">#3060</a>)</li>
<li>C++: Add C++26 keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2955">#2955</a>), add integer literal suffixes (<a href="https://redirect.github.com/pygments/pygments/issues/2966">#2966</a>)</li>
<li>ComponentPascal: Fix <code>analyse_text</code> (<a href="https://redirect.github.com/pygments/pygments/issues/3028">#3028</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3032">#3032</a>)</li>
<li>Coq renamed to Rocq (<a href="https://redirect.github.com/pygments/pygments/issues/2883">#2883</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2908">#2908</a>)</li>
<li>Cython: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2932">#2932</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2933">#2933</a>)</li>
<li>Debian control: Improve architecture parsing (<a href="https://redirect.github.com/pygments/pygments/issues/3052">#3052</a>)</li>
<li>Devicetree: Add support for overlay/fragments (<a href="https://redirect.github.com/pygments/pygments/issues/3021">#3021</a>), add bytestring support (<a href="https://redirect.github.com/pygments/pygments/issues/3022">#3022</a>), fix catastrophic backtracking (<a href="https://redirect.github.com/pygments/pygments/issues/3057">#3057</a>)</li>
<li>Fennel: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2911">#2911</a>)</li>
<li>Haskell: Handle escape sequences in character literals (<a href="https://redirect.github.com/pygments/pygments/issues/3069">#3069</a>, <a href="https://redirect.github.com/pygments/pygments/issues/1795">#1795</a>)</li>
<li>Java: Add module keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2955">#2955</a>)</li>
<li>Lean4: Add operators <code>]'</code>, <code>]?</code>, <code>]!</code>  (<a href="https://redirect.github.com/pygments/pygments/issues/2946">#2946</a>)</li>
<li>LESS: Support single-line comments (<a href="https://redirect.github.com/pygments/pygments/issues/3005">#3005</a>)</li>
<li>LilyPond: Update to 2.25.29 (<a href="https://redirect.github.com/pygments/pygments/issues/2974">#2974</a>)</li>
<li>LLVM: Support C-style comments (<a href="https://redirect.github.com/pygments/pygments/issues/3023">#3023</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2978">#2978</a>)</li>
<li>Lua(u): Fix catastrophic backtracking (<a href="https://redirect.github.com/pygments/pygments/issues/3047">#3047</a>)</li>
<li>Macaulay2: Update to 1.25.05 (<a href="https://redirect.github.com/pygments/pygments/issues/2893">#2893</a>), 1.25.11 (<a href="https://redirect.github.com/pygments/pygments/issues/2988">#2988</a>)</li>
<li>Mathematica: Various improvements (<a href="https://redirect.github.com/pygments/pygments/issues/2957">#2957</a>)</li>
<li>meson: Add additional operators (<a href="https://redirect.github.com/pygments/pygments/issues/2919">#2919</a>)</li>
<li>MySQL: Update keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2970">#2970</a>)</li>
<li>org-Mode: Support both schedule and deadline (<a href="https://redirect.github.com/pygments/pygments/issues/2899">#2899</a>)</li>
<li>PHP: Add <code>__PROPERTY__</code> magic constant (<a href="https://redirect.github.com/pygments/pygments/issues/2924">#2924</a>), add reserved keywords (<a href="https://redirect.github.com/pygments/pygments/issues/3002">#3002</a>)</li>
<li>PostgreSQL: Add more keywords (<a href="https://redirect.github.com/pygments/pygments/issues/2985">#2985</a>)</li>
<li>protobuf: Fix namespace tokenization (<a href="https://redirect.github.com/pygments/pygments/issues/2929">#2929</a>)</li>
<li>Python: Add <code>t</code>-string support (<a href="https://redirect.github.com/pygments/pygments/issues/2973">#2973</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3009">#3009</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3010">#3010</a>)</li>
<li>Tablegen: Fix infinite loop (<a href="https://redirect.github.com/pygments/pygments/issues/2972">#2972</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2940">#2940</a>)</li>
<li>Tera Term macro: Add commands introduced in v5.3 through v5.6 (<a href="https://redirect.github.com/pygments/pygments/issues/2951">#2951</a>)</li>
<li>TOML: Support TOML 1.1.0 (<a href="https://redirect.github.com/pygments/pygments/issues/3026">#3026</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3027">#3027</a>)</li>
<li>Turtle: Allow empty comment lines (<a href="https://redirect.github.com/pygments/pygments/issues/2980">#2980</a>)</li>
<li>XML: Added <code>.xbrl</code> as file ending (<a href="https://redirect.github.com/pygments/pygments/issues/2890">#2890</a>, <a href="https://redirect.github.com/pygments/pygments/issues/2891">#2891</a>)</li>
</ul>
</li>
<li>
<p>Drop Python 3.8, and add Python 3.14 as a supported version (<a href="https://redirect.github.com/pygments/pygments/issues/2987">#2987</a>, <a href="https://redirect.github.com/pygments/pygments/issues/3012">#3012</a>)</p>
</li>
<li>
<p>Various improvements to <code>autopygmentize</code> (<a href="https://redirect.github.com/pygments/pygments/issues/2894">#2894</a>)</p>
</li>
<li>
<p>Update <code>onedark</code> style to support more token types (<a href="https://redirect.github.com/pygments/pygments/issues/2977">#2977</a>)</p>
</li>
<li>
<p>Update <code>rtt</code> style to support more token types (<a href="https://redirect.github.com/pygments/pygments/issues/2895">#2895</a>)</p>
</li>
<li>
<p>Cache entry points to improve performance (<a href="https://redirect.github.com/pygments/pygments/issues/2979">#2979</a>)</p>
</li>
<li>
<p>Fix <code>xterm-256</code> color table (<a href="https://redirect.github.com/pygments/pygments/issues/3043">#3043</a>)</p>
</li>
<li>
<p>Fix <code>kwargs</code> dictionary getting mutated on each call (<a href="https://redirect.github.com/pygments/pygments/issues/3044">#3044</a>)</p>
</li>
</ul>
<p>Version 2.19.2</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/pygments/pygments/commit/708197d82827ba2d5ca78bcbb653c7102ce86dcd"><code>708197d</code></a> Fix underline length.</li>
<li><a href="https://github.com/pygments/pygments/commit/1d4538ae8621d766ecc91ff59caf76ab75983abc"><code>1d4538a</code></a> Prepare 2.20 release.</li>
<li><a href="https://github.com/pygments/pygments/commit/2ceaee4e634eebae2d10a47fd05406871f6bac8f"><code>2ceaee4</code></a> Update CHANGES.</li>
<li><a href="https://github.com/pygments/pygments/commit/e3a3c54b58c7f80bc4db887e471d4f91c77844ed"><code>e3a3c54</code></a> Fix Haskell lexer: handle escape sequences in character literals (<a href="https://redirect.github.com/pygments/pygments/issues/3069">#3069</a>)</li>
<li><a href="https://github.com/pygments/pygments/commit/d7c3453e342dac319f58e4091f4ef183cc49d802"><code>d7c3453</code></a> Merge pull request <a href="https://redirect.github.com/pygments/pygments/issues/3071">#3071</a> from pygments/harden-html-formatter</li>
<li><a href="https://github.com/pygments/pygments/commit/0f97e7c37d44abfa4ddfddf44a3290fdad586034"><code>0f97e7c</code></a> Harden the HTML formatter against CSS.</li>
<li><a href="https://github.com/pygments/pygments/commit/9f981b2ba42b88ca5bdcebf12cd01efd7cd80aec"><code>9f981b2</code></a> Update CHANGES.</li>
<li><a href="https://github.com/pygments/pygments/commit/1d889151024e9a53f3702a60558b29b070306e9e"><code>1d88915</code></a> Update CHANGES.</li>
<li><a href="https://github.com/pygments/pygments/commit/c3d93adb9827fc054c3c12b47bde31c781a36a93"><code>c3d93ad</code></a> Fix ASN.1 lexer: recognize minus sign and fix range operator (<a href="https://redirect.github.com/pygments/pygments/issues/3060">#3060</a>)</li>
<li><a href="https://github.com/pygments/pygments/commit/4f06bcf8a5ba3f2b5bda24a26ccf041a1a65d91e"><code>4f06bcf</code></a> fix bad behaving backtracking regex in CommonLispLexer</li>
<li>Additional commits viewable in <a href="https://github.com/pygments/pygments/compare/2.18.0...2.20.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pygments&package-manager=pip&previous-version=2.18.0&new-version=2.20.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/openxla/xla/network/alerts).

</details>
Copybara import of the project:

--
e92d35df7c0db139d0bac338500dac48aed598ff by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump pygments in /xla/backends/cpu/benchmarks/e2e/gemma2/flax_2b

Bumps [pygments](https://github.com/pygments/pygments) from 2.18.0 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](pygments/pygments@2.18.0...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #40111

PiperOrigin-RevId: 892233721
…tead of IsBlackwell()

Imported from GitHub PR openxla/xla#40118

📝 Summary of Changes
- Add `HasTcgen05()` to `CudaComputeCapability` that returns true for SM 10.x
    and SM 11.x (architectures with tcgen05 support) and false for SM 12.x
- Replace the `IsBlackwell()` guard in `xtile_compiler.cc` with `HasTcgen05()`
- Add unit tests for `HasTcgen05()` and parameterized PTX compiler tests that
    verify tcgen05 instructions compile only on supported architectures

🎯 Justification
The Triton compiler's tensor memory size check used `IsBlackwell()` which only matches SM 10.x. This breaks on SM 11.0 (Thor) which also has tcgen05 tensor memory. When the check is skipped, the Triton `TritonTensorMemoryAllocationPass` may assign TMEM column offsets > 512. These out-of-bounds offsets cannot be allocated, so `ttng.tmem_alloc` [lowers](https://github.com/triton-lang/triton/blob/3e15c1a3e3a931578d2bfc9cb06ad12628adae2c/third_party/nvidia/lib/NVGPUToLLVM/NVGPUToLLVMPass.cpp#L595-L598) to LLVM `undef`/`trap` instruction. The kernel launches, immediately hits the trap, and returns `CUDA_ERROR_LAUNCH_FAILED`. Since all autotuning configs are profiled on a shared CUDA stream, this fatal error poisons the GPU context and causes **all** subsequent kernel launches to fail.
Copybara import of the project:

--
a652a39cee668e16151e6df32080e6d79d8d0986 by Andrei Ivanov <anivanov@nvidia.com>:

Fix tensor memory size check to apply on all Blackwell+ architectures.

--
68c3a0e0943124564788b407937d9d1674f6b1c8 by Andrei Ivanov <anivanov@nvidia.com>:

Restrict the check only to compute capabilities with tcgen05 support

--
e6cd06405a90ed00f1d779cb501013c274c77da5 by Andrei Ivanov <anivanov@nvidia.com>:

Remove statement about future architectures

--
4b441791f31cf2624ad3e242c45e3286b5ed3701 by Andrei Ivanov <anivanov@nvidia.com>:

Test tcgen05 availability

Merging this change closes #40118

PiperOrigin-RevId: 892233885
There is no need to go directly via the LocalClient. The only downside is that
we cannot keep the check whether the algorithm from the autotuning database was
used, but that should be checked elsewhere. If loading from the autotuning
database fails, the test will still fail (verified with removing the entry with
algorithm 28).

PiperOrigin-RevId: 892250383
This change ensures that the hardcoded GPU target configs are identical to dynamically determined target configs. This is important for AOT compilation
where we rely on these configs to be accurate.

So, this change expands the GpuDeviceInfoProto to include all fields of stream_executor::DeviceDescription, enabling full serialization and deserialization and fills the missing fields in the target configs. Some fields were missing completely, some were only missing in ToProto, or FromProto. Some fields are serialized correctly, but missing in operator==. This should now all be consistent.

It also introduces a new EqualsTo method with CompareOptions to allow comparisons that ignore version numbers or non-portable fields like PCI bus ID and interconnect UUIDs.

This makes DeviceDescription a regular type in the sense that operator== covers all fields, and the conversion to proto and back recovers the same logical object (operator== returns true).

In particular the symmetric proto conversion is necessary to work with PjRt since the C compatibility layer converts DeviceDescriptions to proto and back in the same process.

This change also:

1. Reenables gpu_device_info_test - This test was skipped since enabling MIGs
2. Explicitly runs gpu_device_info_test on all GPU types that we support
3. Unify the handling of version numbers in GpuTargetConfig and DeviceDescription (yes, we store some version numbers in multiple locations). This needs to be fixed in a later change (b/497743152).

PiperOrigin-RevId: 892262208
Broke internal tests.

Reverts 78346d7

PiperOrigin-RevId: 892273753
It is unused in open source.

PiperOrigin-RevId: 892285006
Imported from GitHub PR openxla/xla#39744

📝 Summary of Changes
The CUB sort FFI handler was recently refactored and consolidated in two commits:
- openxla/xla@878c525
- openxla/xla@ce4b95a

In order to allow ROCm devices to continue using the CUB sort function, these same changes were ported to the relevant ROCm files.

Changes to files in `xla/stream_executor/rocm/`
- The build file is restructured so that it no longer compiles `cub_sort_kernel_rocm.cc` for each type.
- The various helper functions from commit openxla/xla@878c525 have been ported here.
- The missing S32_B16 and S32_B64 template instantiations have been added.

🎯 Justification
This change was breaking unit tests in XLA and JAX on ROCm:
- XLA:
  - xla/gpu/service/tests/sorting_test.cc

- JAX:
  - tests/ann_test.py::AnnTest::test_approx_max_k0
  - tests/ann_test.py::AnnTest::test_approx_max_k5
  - tests/ann_test.py::AnnTest::test_approx_max_k8
  - tests/ann_test.py::AnnTest::test_approx_min_k0
  - tests/ann_test.py::AnnTest::test_approx_min_k5
  - tests/ann_test.py::AnnTest::test_approx_min_k8
  - tests/scipy_signal_test.py::LaxBackedScipySignalTests::testWelchAgainstNumpy8

🚀 Kind of Contribution
🐛 Bug Fix

🧪 Unit Tests:
The various unit tests are now passing with the changes in the PR.
- XLA (`sorting_test.cc`):
```
================================================================================
INFO: Found 1 test target...
Target //xla/service/gpu/tests:sorting_test_amdgpu_any up-to-date:
  bazel-bin/xla/service/gpu/tests/sorting_test_amdgpu_any
INFO: Elapsed time: 192.140s, Critical Path: 153.72s
INFO: 5136 processes: 650 internal, 4486 local.
INFO: Build completed successfully, 5136 total actions
//xla/service/gpu/tests:sorting_test_amdgpu_any                          PASSED in 31.8s
  Stats over 15 runs: max = 31.8s, min = 22.3s, avg = 29.9s, dev = 2.7s

Executed 1 out of 1 test: 1 test passes.
```
- JAX (`tests/ann_test.py::AnnTest::test_approx_*`):
```
==================================================================== test session starts ====================================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /workspace/rocm-jax/jax
configfile: pyproject.toml
plugins: xdist-3.8.0, hypothesis-6.142.1
collected 42 items / 32 deselected / 10 selected

tests/ann_test.py::AnnTest::test_approx_max_k0 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k1 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k2 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k3 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k4 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k5 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k6 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k7 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k8 PASSED
tests/ann_test.py::AnnTest::test_approx_max_k9 PASSED

============================================================= 10 passed, 32 deselected in 9.25s =============================================================
```
- JAX (`tests/scipy_signal_test.py::LaxBackedScipySignalTests::testWelchAgainstNumpy8`):
```
==================================================================== test session starts ====================================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /workspace/rocm-jax/jax
configfile: pyproject.toml
plugins: xdist-3.8.0, hypothesis-6.142.1
collected 100 items / 99 deselected / 1 selected

tests/scipy_signal_test.py::LaxBackedScipySignalTests::testWelchAgainstNumpy8 PASSED

============================================================= 1 passed, 99 deselected in 8.16s ==============================================================
```
Copybara import of the project:

--
6b432c264245514680969c508d9a0c510b15aed5 by tsrw2048 <239799652+tsrw2048@users.noreply.github.com>:

[ROCm] Porting CUB sort FFI handler consolidation to ROCm.

The CUB sort FFI handler was refactored and consolidated
in two commits:
 - 878c52525f1a72856e5c5fd42775211f19266568
 - ce4b95a3668977155ec4914b1b28f2d5bf040ef2
In order to allow ROCm devices to continue using the CUB sort
function, these same changes were ported to the relevant ROCm
files.

Changes to files in xla/stream_executor/rocm/
- The build file is restructured so that it no longer compiles
cub_sort_kernel_rocm.cc for each type.
- The various helper functions from commit 878c525 have
been ported here.
- The missing S32_B16 and S32_B64 template instantiations have
been added.

Merging this change closes #39744

PiperOrigin-RevId: 892285071
PiperOrigin-RevId: 892285087
@pull pull Bot locked and limited conversation to collaborators Mar 31, 2026
@pull pull Bot added the ⤵️ pull label Mar 31, 2026
@pull pull Bot merged commit aacef90 into GesuBackups:master Mar 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.