Skip to content

[pull] master from tensorflow:master#1678

Merged
pull[bot] merged 12 commits into
GesuBackups:masterfrom
tensorflow:master
Mar 31, 2026
Merged

[pull] master from tensorflow:master#1678
pull[bot] merged 12 commits into
GesuBackups:masterfrom
tensorflow:master

Conversation

@pull

@pull pull Bot commented Mar 31, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

hyeontaek and others added 12 commits March 30, 2026 18:32
…for various PjRt clients

This change implements `PjRt(Loaded)Executable::GetParameterMemoryKinds()` for the PjRt CPU, GPU, and other clients. This fixes infinite recursion of `GetParameterMemoryKinds()` that happens when `PjRtLoadedExecutable::GetParameterMemoryKinds()` uses the default implementation that forwards it to `PjRtExecutable::GetParameterMemoryKinds()` and comes back to `PjRtLoadedExecutable::GetParameterMemoryKinds()` via `PjRtExecutableForwarder`.

PiperOrigin-RevId: 892019571
This fixes a stack overflow (SIGSEGV) seen during compilation of complex Torax simulation steps. The default stack size of 512 KiB was insufficient for deep call stacks in XLA. 2 MiB matches the TSAN default in this file and is sufficient for our workloads, while still being more conservative about memory compared to the 4 MiB used by the standard XLA CPU compiler.

More details:
The crash occurs deep inside LLVM's ORC JIT infrastructure during symbol resolution and object finalization (RuntimeDyldImpl::finalizeAsync and ExecutionSession::lookup). XLA configures LLVM ORC to use an InPlaceTaskDispatcher. When one object is being finalized, it looks up its dependencies. The InPlaceTaskDispatcher runs these lookup and finalization tasks synchronously on the same thread. If you have a long chain of dependent modules (typical for complex Torax/JAX graphs), this synchronous "dispatch" builds up stack frames for each dependency. Eventually, it blows past the default thread stack limits.

In cpu_compiler.cc we're already using 4MiB, but the CPU backend uses the callers thread by default unless it decides to parallelize.

PiperOrigin-RevId: 892021510
PiperOrigin-RevId: 892086956
PiperOrigin-RevId: 892111013
PiperOrigin-RevId: 892111881
PiperOrigin-RevId: 892116530
…trics (Memory Profiles).

2. Fixed ‘Unable to disable’ Problem: Solved conditions where empty lists activities=[] were falsy-defaults and incorrectly overrode them with defaults.
3. Removed hardcoded options in legacy path

PiperOrigin-RevId: 892129802
PiperOrigin-RevId: 892135390
PiperOrigin-RevId: 892142136
PiperOrigin-RevId: 892145541
PiperOrigin-RevId: 892152188
@pull pull Bot locked and limited conversation to collaborators Mar 31, 2026
@pull pull Bot added the ⤵️ pull label Mar 31, 2026
@pull pull Bot merged commit 1c0fbe0 into GesuBackups:master Mar 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants