v25.03.00
This is a beta release of cuPyNumeric.
Linux x86 and ARM conda packages are available for this release at https://anaconda.org/legate/cupynumeric.
Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/25.03/.
New features
Licensing
- With this release the Legate framework, on which cuPyNumeric is based, becomes open-source, under the Apache-2.0 license. This makes the entire cuPyNumeric stack (anything above the CUDA library level) open-source.
Added functionality
- Matrix exponential:
cupynumeric.linalg.expm - Batched eigendecomposition:
cupynumeric.linalg.eigvals&cupynumeric.linalg.eig
Performance improvements
- No longer doing unnecessary streaming when running matrix multiplication on a single processor/GPU.
UX improvements
- Add the
legate.core.ProfileRangePython context manager, to annotate sub-spans within a larger task span on the profiler visualization. - Add the
local_task_arrayhelper function, that can be used in Python tasks to create a view over a Store/Array argument, using a NumPy or CuPy array as appropriate based on the type of memory where the data is located.
Documentation improvements
- Add a user guide chapter on accelerating multi-GPU HDF5 workloads.
Known issues
- We are aware of possible performance regressions when using UCX 1.18. We are temporarily restricting our packages to UCX <= 1.17 while we investigate this.