Skip to content

Releases: nv-legate/cupynumeric

v24.11.01

07 Dec 06:42
1207434

Choose a tag to compare

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Bug Fixes

  • Explicit fallback to __array__() on __buffer__

v24.11.00

17 Nov 00:51
eedb7e1

Choose a tag to compare

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

  • Implement np.unravel_index
  • Implement np.angle
  • Implement np.median
  • Implement np.ix_
  • Implement np.meshgrid
  • Implement np.expand_dims
  • Implement np.rot90
  • Implement np.round
  • Implement np.fft.fftshift and np.fft.ifftshift
  • Implement np.roll
  • Support full_matrices parameter of np.linalg.svd

Memory management enhancements

  • Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
  • Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
  • Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

  • GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
  • GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
  • Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
  • Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
  • More enhancements from Legate 24.11

Other

  • Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.

v24.06.01

11 Sep 20:36
v24.06.01
370f766

Choose a tag to compare

This is a patch release, and includes the following fixes:

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

v24.06.00

03 Jul 22:35
28296b4

Choose a tag to compare

This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:

  • np.linalg.qr, np.linalg.svd (single-GPU support only)
  • "where" argument for unary operations
  • np.select
  • np.flipup, np.fliplr
  • np.cov
  • np.load (initial, unoptimized implementation)
  • np.average
  • np.logical_and/or.reduce
  • np.digitize
  • np.diff
  • np.linalg.cholesky, np.linalg.solve (multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)
  • C++-based ndarray class (experimental support)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Known issues

Including the nvidia conda channel in an environment with cunumeric may end up pulling cutensor 2.0, even though the cunumeric packages explicitly request cutensor 1.7. This can cause error messages like this:

OSError: libcutensor.so.1: cannot open shared object file: No such file or directory

This is not an issue with cuNumeric, but with incorrect constraints on the cutensor packages on the nvidia channel. Please avoid including the nvidia conda channel in any conda environment including cunumeric.

v23.11.00

21 Nov 01:47
d91f17c

Choose a tag to compare

This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🐛 Bug Fixes

📖 Documentation

Full Changelog: v23.09.00...v23.11.00

v23.09.00

03 Oct 15:23
e66a063

Choose a tag to compare

This release adds support for the quantile API, and includes some performance and documentation improvements (notably a "Best Practices" guide).

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Full Changelog: v23.07.00...v23.09.00

v23.07.00

25 Jul 04:51
d413db2

Choose a tag to compare

This release adds support for histogram, broadcast* and various nan* APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

  • Note new minimum CUDA requirements for conda packages by @manopapad in #875

🐛 Bug Fixes

New Contributors

Full Changelog: v23.03.00...v23.07.00

v23.03.00

15 Mar 20:02
9ac887b

Choose a tag to compare

This is the beta release of cuNumeric.

This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

🛠️ Improvements

📖 Documentation

Full Changelog: v23.01.00...v23.03.00

v23.01.00

31 Jan 03:38
2455b55

Choose a tag to compare

This release introduces support for the put and putmask operations, adds an optimized implementation for the common case of advanced indexing using a single (possibly broadcasted) boolean array, includes more information in the tags of unary/binary operations on profiles (for easier cross-referencing with the source script), and adds some small improvements to OpenMP execution.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Full Changelog: v22.10.00...v23.01.00

v22.10.00

13 Oct 23:53
81ad156

Choose a tag to compare

The biggest change in Release 22.10 is a new build infrastructure using CMake and scikit-build. The new build system brings several benefits including robust build dependency tracking and compliance with Python site-packages. This release includes several new search and indexing operators, fixes for several performance and correctness bugs, and provenance tracking for top-level and ndarray routines in execution profiles.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

• Argwhere and flatnonzero by @mfoerste4 in #525

🛠️ Improvements

  • adding support for array shape () passed as an index argument in advanced indexing by @ipdemes in #486
  • Refactor test driver for cpu/gpu sharding by @bryevdv in #451
  • Collate test output to allow workers > 1 with verbose output by @bryevdv in #507
  • Ensure test.py --use flag fully overrides USE_* envvars by @manopapad in #524
  • Enhance two integration tests by @robinw0928 in #511
  • Add typing to array.py by @bryevdv in #478
  • Update test runner for osx by @bryevdv in #529
  • Don't blindly trust user-supplied bincount.minlength by @manopapad in #523
  • Make reduced-precision cuBLAS mode opt-in by @manopapad in #519
  • Fix reciprocal tests for zero values and improve test value customization (#467) by @marcinz in #537
  • Refactor test runner to support more pinning options by @bryevdv in #535
  • Remove dead code ian bincount by @magnatelee in #546
  • Make the validation condition for random distributions lenient by @magnatelee in #550
  • src/cunumeric: handle high number of bins in GPU bincount by @rohany in #526
  • Construct NumPy arrays correctly from 0D deferred arrays backed by region fields by @magnatelee in #551
  • Collect test failure details at the end by @bryevdv in #556
  • Simplify some thunk conversion helpers by @manopapad in #553
  • Fix a compiler warning by @magnatelee in #555
  • Add option to disable CPU pinning in tests by @bryevdv in #558
  • Use the new mapper registration to enable detailed mapper logging by @magnatelee in #570
  • src/cunumeric/search: make nonzero not always allocate SYS_MEM buffers by @rohany in #572
  • add negative test case in test_array_split.py by @xialu00 in #545
  • add some test cases for test_arg_reduce.py by @xialu00 in #575
  • Testcase-add test cases for test_flip and test_indices by @xialu00 in #579
  • Refactor scalar reductions to use common execution policy by @jjwilke in #573
  • Sanitize k for the eye operator by @magnatelee in #586
  • Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #514
  • Enhance test_block.py and test_eye.py by @robinw0928 in #578
  • Testcase add test cases for test_fill.py and test_ndim.py by @xialu00 in #588
  • Remove run dependency on curand by @marcinz in #520
  • Use Legion Fills when possible by @manopapad in #604
  • Support building with GASNet-Ex and MPI backends by @manopapad in #610
  • Provenance tracking for cuNumeric operators by @magnatelee in #596
  • Fix tests utils to make --directory work correctly. by @robinw0928 in #592
  • Fix a compiler warning by @magnatelee in #594
  • Enhance test_diag_indices.py and test_flatten.py. by @robinw0928 in #609
  • cuNumeric doesn't need nested provenance tracking by @magnatelee in #617
  • Add RuntimeError exception to legate.time by @robinw0928 in #618
  • Stop instantiating min and max reduction ops for complex types by @magnatelee in #621
  • Mark temporary conversion outputs as linear for eager storage recycling by @magnatelee in #608
  • Make the negative test on fill robust across Python versions by @magnatelee in #619
  • Enhance mask_indices and move_axis by @robinw0928 in #622
  • src/cunumeric/matrix: stop including coll.h in solve_template.inl by @rohany in #620

🐛 Bug Fixes

  • Fix performance bugs in scalar reductions by @magnatelee in #509
  • Don't use internal LAPACK function names by @manopapad in #522
  • Bug fixes for advanced indexing by @magnatelee in #532
  • Handle the case where LAPACK_*potrf is a macro, not a function by @manopapad in #527
  • fix mypy issue w/ np methods by @bryevdv in #542
  • Fix buggy complex-to-bool conversions and add correctness tests for astype by @magnatelee in #549
  • fixing advanced indexing operation for empty arrays by @ipdemes in #504
  • Do not link curand by @marcinz in #541
  • Fixing issues with advanced_indexing_kernel by @ipdemes in #557
  • fixing another corner case for advanced indexing by @ipdemes in #554
  • Fix OSX test shard generation by @bryevdv in #563
  • fix error print in test_unary_ufunc by @jjwilke in #566
  • Add NAN handling to convert() needed for some prefix routines with integer outputs. by @rkarim2 in #502
  • Fixing logic for slicing by @ipdemes in #574
  • Fix linalg.solve when inputs are scalars by @magnatelee in #585
  • Allow casting in cn.dot, to match numpy's behavior by @manopapad in #598
  • Add linalg.solve to the cmake build by @magnatelee in #603
  • Invoke eye with read-write privilege, not write-discard by @manopapad in #616
  • Fix a bug in scalar reduction launching kernels with empty domains by @magnatelee in #606

📖 Documentation

  • Added note to prefix documentation for corner cases where cunumeric results can diverge from numpy by @rkarim2 in #528
  • updating documentation by @ipdemes in #614
  • Add missing docs symlink by @bryevdv in #635