[pull] master from tensorflow:master#1688
Merged
Merged
Conversation
This CL is part 1 of cl/886092985. It lands the logic for negation (-v0 instead of + -1 * v0) and simplifies map output by omitting empty symbol brackets []. It also deprecates many AffineExpr/Map methods in indexing_map_serialization. Landing this minimizes the number of test updates we have to do in the following CLs. PiperOrigin-RevId: 893497910
PiperOrigin-RevId: 893502380
…s to avoid instruction cache thrashing. PiperOrigin-RevId: 893502849
…hine. This change introduces a static factory method `TargetMachineOptions::Native()` which automatically populates the target triple, CPU name, and CPU features based on the host machine's characteristics using LLVM's host detection utilities. A test is added to ensure the inferred options match LLVM's host CPU information. PiperOrigin-RevId: 893503067
PiperOrigin-RevId: 893503478
…cessing. This was relied on for using local wheels as overrides for matching requirements defined in lock files, and will allow to stop using the purely suggestive `--find-links` that was needed for the 1.8.4 upgrade because of the regression that's fixed in 1.8.5.\ `pkg @ wheel URL` will be used instead for local wheels again. The 1.8.5 upgrade is a tiny regression fix-only upgrade and thus doesn't require any other adjustments:\ https://rules-python.readthedocs.io/en/latest/changelog.html#v1-8-5 More fix context:\ openxla/xla@11a2044 openxla/xla@0025bf7 PiperOrigin-RevId: 893537944
PiperOrigin-RevId: 893542078
When the user is not requesting a specific CPU architecture XLA automatically detects the host architecture and passes this information to FFI based custom calls and XLA:CPU. So far this detection has been ignoring architecture features (like SSE, Neon, AVX, etc.). So this change adds the missing HW feature detection and also updates the embedded system configs. It also adds a tests that ensures the the embedded system configs are in sync with the actual systems. The feature detection takes `DebugOptions::xla_cpu_max_isa` into account which allows the user to limit the feature set to an older generation of CPU to make the binary more portable. PiperOrigin-RevId: 893544830
PiperOrigin-RevId: 893562678
Imported from GitHub PR openxla/xla#40311 XLA:GPU switched to structured concurrency with `AsyncStartThunk` and `AsyncDoneThunk`, remove a bad experiment with `WaitForStreamThunks`. Copybara import of the project: -- 303f683519bd73d80b474570cfe63953ccd83e7d by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:gpu] Delete vestigial WaitForStreamsThunk Merging this change closes #40311 PiperOrigin-RevId: 893588227
PiperOrigin-RevId: 893591324
There's no reason to execute the generated HLO. This test exercises a utility that generates a specific sequence of XlaOps. This change refactors the test such that we're just comparing the generated HLO against an expected HLO. The expected HLO was obtained from printing the module generated by the old test. PiperOrigin-RevId: 893592373
…further investigation. Reverts 7fc14e3 PiperOrigin-RevId: 893594429
Reverts c6d844d PiperOrigin-RevId: 893602331
… module does not have a name PiperOrigin-RevId: 893611161
PiperOrigin-RevId: 893613883
…ologyDescription from proto PiperOrigin-RevId: 893622566
Imported from GitHub PR openxla/xla#40117 This PR updates the XLA Linux x86 GPU oneAPI presubmit coverage by expanding the test scope from //xla/stream_executor/sycl/... and //xla/service/gpu/... to the broader set //xla/..., //build_tools/..., and @tsl//tsl/..., ensuring more comprehensive validation. Accordingly, it switches from _XLA_ONEAPI_TARGET_PATTERNS to _XLA_DEFAULT_TARGET_PATTERNS to align oneAPI presubmit checks with the default XLA test coverage. Copybara import of the project: -- 50e4447d7e31ffd9f71fa20805a5b330089b77da by mraunak <mayank.kumar.raunak@intel.com>: Update build.py -- eb896f9d6386638e8150e815b18ecd0f88235dca by mraunak <mayank.kumar.raunak@intel.com>: Update golden_commands.txt Merging this change closes #40117 PiperOrigin-RevId: 893639525
By this change, the shardy outliner translates the named computations into separate calls leaving it as a flat call graph. PiperOrigin-RevId: 893645729
- Moved TransposePlanCache and its mutex from PjRtStreamExecutorClient, PjRtCpuClient, and TpuClient to CommonPjRtClient. - Added GetTransposePlan to CommonPjRtClient for thread-safe access. - Updated call sites to use the new centralized interface. PiperOrigin-RevId: 893657648
…alAsync` This fixes a bug where PjRt CPU buffers ignore major-to-minor in the layout inside `ToLiteral`. The CL also makes `CopyToLiteralAsync` perform the work asynchronously as intended by the API. PiperOrigin-RevId: 893674302
PiperOrigin-RevId: 893678182
PiperOrigin-RevId: 893686530
…des on all memory kinds PiperOrigin-RevId: 893691249
This flag preset will continue to be developed with fast compilation times and numerical stability in mind as the top goals (runtime performance only a secondary goal). Expect tradeoffs similar to XX% compilation time for X% runtime to occur under this flag. Currently, it just sets LLVM codegen opt to O1, disables platform dependent math, and sets `flatten_after_fusion` to true. PiperOrigin-RevId: 893693608
…ucket.table. Also, cleanup includes. Benchmarks are slightly negative, which is expected because the benchmark doesn't cover the high-contention/very-large-buckets motivating case. Note that the benchmarks are still faster than when we used tsl::Hash64. ``` name cpu/op cpu/op vs base BM_SendRecv 93.38n ± 2% 99.09n ± 1% +6.12% (p=0.000 n=20) BM_RecvSend 76.73n ± 1% 83.00n ± 1% +8.18% (p=0.000 n=20) BM_PingPong/100 308.9µ ± 2% 311.7µ ± 2% ~ (p=0.841 n=20) BM_PingPong/200 612.4µ ± 3% 614.2µ ± 2% ~ (p=0.799 n=20) BM_PingPong/300 929.6µ ± 3% 932.4µ ± 3% ~ (p=0.968 n=20) geomean 16.60µ 17.11µ +3.11% name time/op time/op vs base BM_SendRecv 93.59n ± 2% 99.32n ± 1% +6.12% (p=0.000 n=20) BM_RecvSend 76.89n ± 1% 83.19n ± 1% +8.19% (p=0.000 n=20) BM_PingPong/100 704.2µ ± 1% 693.8µ ± 3% ~ (p=0.086 n=20) BM_PingPong/200 1.434m ± 3% 1.393m ± 4% ~ (p=0.201 n=20) BM_PingPong/300 2.158m ± 2% 2.120m ± 2% ~ (p=0.265 n=20) geomean 27.49µ 27.91µ +1.53% name INSTRUCTIONS/op INSTRUCTIONS/op vs base BM_SendRecv 1.053k ± 0% 1.229k ± 0% +16.71% (p=0.000 n=20) BM_RecvSend 833.2 ± 0% 1008.2 ± 0% +21.00% (p=0.000 n=20) BM_PingPong/100 539.0k ± 0% 576.2k ± 0% +6.90% (p=0.000 n=20) BM_PingPong/200 1.024M ± 0% 1.098M ± 0% +7.29% (p=0.000 n=20) BM_PingPong/300 1.507M ± 0% 1.621M ± 0% +7.55% (p=0.000 n=20) geomean 59.24k 66.20k +11.74% name CYCLES/op CYCLES/op vs base BM_SendRecv 328.7 ± 2% 348.4 ± 1% +6.00% (p=0.000 n=20) BM_RecvSend 269.9 ± 1% 292.0 ± 1% +8.21% (p=0.000 n=20) BM_PingPong/100 649.2k ± 1% 650.8k ± 1% ~ (p=0.841 n=20) BM_PingPong/200 1.279M ± 1% 1.281M ± 2% ~ (p=0.968 n=20) BM_PingPong/300 1.917M ± 1% 1.926M ± 1% ~ (p=0.369 n=20) geomean 42.65k 43.92k +2.97% name items/s items/s vs base BM_PingPong/100 323.8k ± 2% 320.8k ± 2% ~ (p=0.841 n=20) BM_PingPong/200 326.6k ± 3% 325.6k ± 2% ~ (p=0.799 n=20) BM_PingPong/300 322.7k ± 2% 321.7k ± 2% ~ (p=0.968 n=20) geomean 324.3k 322.7k -0.50% ``` PiperOrigin-RevId: 893694334
PiperOrigin-RevId: 893714548
Main goal is to not include Eigen when all we need is error codes. PiperOrigin-RevId: 893722480
PiperOrigin-RevId: 893732146
…tency. PiperOrigin-RevId: 893751639
PiperOrigin-RevId: 893764092
…o IFRT. PiperOrigin-RevId: 893765489
`xla::ifrt::Value::ByteSize()` is a new API that asks the IFRT runtime to compute the byte size of the IFRT value object (an array or an upcoming tuple). This API will provide the user with a fast and accurate way to calculate on-device sizes of IFRT value objects, and the runtime would be responsible for providing a robust implementation of this calculation. Since `xla::ifrt::Array` is a subclass of `xla::ifrt::Value`, `xla::ifrt::Array::ByteSize()` can be used without casting `xla::ifrt::ArrayRef` to `xla::ifrt::ValueRef`. The initial implementation of this method uses `Layout::ByteSize()` or `PjRtLayout::ByteSize()` to compute it on the fly. The implementation currently does not cache or precompute it. PiperOrigin-RevId: 893768289
Previously, JAX and StableHLO did not have asynchronous collectives. Thus, every JAX program lowered, via StableHLO, to an HLO program without asynchronous collectives. Recently, we added asynchronous collectives to JAX and StableHLO, but the XLA CPU backend doesn't support asynchronous collectives. If you try to run a JAX program with asynchronous collectives on CPU, it will crash. We can solve this in two ways. (1) We could do nothing and let the program crash. (2) We could replace the asynchronous collectives with synchronous collectives. This CL implements option 2. The XLA CPU backend now immediately replaces any asynchronous collectives with their synchronous counterparts. PiperOrigin-RevId: 893768474
Updates LLVM usage to match [7ccd92e5e6e5](llvm/llvm-project@7ccd92e5e6e5) PiperOrigin-RevId: 893768833
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )