Releases · nv-legate/cupynumeric

07 Dec 06:42

marcinz

v24.11.01

1207434

v24.11.01

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Bug Fixes

Explicit fallback to __array__() on __buffer__

Assets 2

17 Nov 00:51

manopapad

v24.11.00

eedb7e1

v24.11.00

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

Implement np.unravel_index
Implement np.angle
Implement np.median
Implement np.ix_
Implement np.meshgrid
Implement np.expand_dims
Implement np.rot90
Implement np.round
Implement np.fft.fftshift and np.fft.ifftshift
Implement np.roll
Support full_matrices parameter of np.linalg.svd

Memory management enhancements

Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
More enhancements from Legate 24.11

Other

Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.

Assets 2

11 Sep 20:36

manopapad

v24.06.01

370f766

v24.06.01

This is a patch release, and includes the following fixes:

Fix for nv-legate/legate#947
Fix package dependencies (cuda and openblas)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Assets 2

03 Jul 22:35

manopapad

v24.06.00

28296b4

v24.06.00

This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:

np.linalg.qr, np.linalg.svd (single-GPU support only)
"where" argument for unary operations
np.select
np.flipup, np.fliplr
np.cov
np.load (initial, unoptimized implementation)
np.average
np.logical_and/or.reduce
np.digitize
np.diff
np.linalg.cholesky, np.linalg.solve (multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)
C++-based ndarray class (experimental support)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Known issues

Including the nvidia conda channel in an environment with cunumeric may end up pulling cutensor 2.0, even though the cunumeric packages explicitly request cutensor 1.7. This can cause error messages like this:

OSError: libcutensor.so.1: cannot open shared object file: No such file or directory

This is not an issue with cuNumeric, but with incorrect constraints on the cutensor packages on the nvidia channel. Please avoid including the nvidia conda channel in any conda environment including cunumeric.

Assets 2

21 Nov 01:47

marcinz

v23.11.00

d91f17c

v23.11.00

This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Added variance as a unary reduction by @jjwilke in #593
Add batched cholesky implementation and tests by @jjwilke in #1029

🐛 Bug Fixes

Replacing set with OrderedSet to avoid control-replication violations by @ipdemes in #1054
Inline boolean operators in NumPy are bitwise, not logical by @manopapad in #1057
Fix #1065 ("where" fails with IndexError) by @manopapad in #1067
Fixes #1069, #1070 (minor einsum bugs) by @manopapad in #1072

📖 Documentation

Suggest using mamba over conda by @manopapad in #1068

Full Changelog: v23.09.00...v23.11.00

Contributors

jjwilke, manopapad, and ipdemes

Assets 2

03 Oct 15:23

marcinz

v23.09.00

e66a063

v23.09.00

This release adds support for the quantile API, and includes some performance and documentation improvements (notably a "Best Practices" guide).

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Quantile Implementation by @aschaffer in #664

🛠️ Improvements

Add missing openmp variants to BitGenerator and UniqueReduce by @rohany in #1010
Histogram refactor by @aschaffer in #1003

📖 Documentation

Add best practices info to sphinx docs by @bryevdv in #1048

🐛 Bug Fixes

Missing alignment on histogram call by @manopapad in #999
Fix for control replication violation in test by @ipdemes in #1005
Fix build instructions link by @bryevdv in #1014
Add back None as an accepted value for axis on some type sigs by @manopapad in #1017
If a scalar ufunc arg is cn.ndarray use its type directly by @manopapad in #1011
Skip the docstrings for functions pulled from cloned modules by @manopapad in #1024
Fix random test failures in CPU-only runs by @manopapad in #1025
Don't cast histogram to int64 when density=True by @manopapad in #1042
Explicitly cast result of shift binary operators by @manopapad in #1046
Remove use of deprecated np.find_common_type by @manopapad in #1045

New Contributors

@ajschmidt8 made their first contribution in #1035

Full Changelog: v23.07.00...v23.09.00

Contributors

manopapad, bryevdv, and 4 other contributors

Assets 2

25 Jul 04:51

marcinz

v23.07.00

d413db2

v23.07.00

This release adds support for histogram, broadcast* and various nan* APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Implement broadcast routines by @bryevdv in #759
Sanitize unary reductions that have NaNs by @shriram-jagan in #925
Histogram Functionality by @aschaffer in #983

🛠️ Improvements

Add ufunc methods by @bryevdv in #834
Support of the shape argument in empty_like() & Co. by @madsbk in #845
Add support for Python 3.11 (#830) by @marcinz in #837
Ensure ufunc/function dispatching is narrow by @seberg in #977
Fft improvements by @mfoerste4 in #732

📖 Documentation

Note new minimum CUDA requirements for conda packages by @manopapad in #875

🐛 Bug Fixes

Fix bugs in concatenate and stack APIs. by @robinwnv in #844
Fixes #858 by @manopapad in #859
Fix concatenate and *stack APIs to support scalars(#818, #839) by @robinwnv in #866
Avoid following compiler symlinks by @manopapad in #880
Fix for some binary operators on float16 by @magnatelee in #889
WAR for TBLIS compiler detection while upstream PR is pending by @manopapad in #890
Also build CPU-only packages for haswell (#869) by @marcinz in #882
Fix array API(#885). by @robinwnv in #910
Fix unit tests by @magnatelee in #920
Fix an incorrect type by @marcinz in #931
Use correct type, to avoid int narrowing by @manopapad in #941
Fix cunumeric.arange issues by @yimoj in #940
Use the right type for scalar arguments by @magnatelee in #942
Fall back to NumPy eagerly on RandomState methods by @manopapad in #959
Fix bugs in random integer functions by @manopapad in #966
Resolve numpy 1.25 issues by @bryevdv in #973
Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #971
fixing putmask logic for scalar inputs by @ipdemes in #980
fixing cuda error by @ipdemes in #978
Change arg to LLONG_MIN to make it consistent with python. by @shriram-jagan in #986
Missing alignment on histogram call by @manopapad in #1000

New Contributors

@madsbk made their first contribution in #845
@sandeepd-nv made their first contribution in #899
@seberg made their first contribution in #977
@shriram-jagan made their first contribution in #988
@aschaffer made their first contribution in #983

Full Changelog: v23.03.00...v23.07.00

Contributors

seberg, manopapad, and 11 other contributors

Assets 2

15 Mar 20:02

marcinz

v23.03.00

9ac887b

v23.03.00

This is the beta release of cuNumeric.

This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

Do reductions properly in tensor contraction tasks by @magnatelee in #803
Seed the NumPy RNG at the start of every test by @manopapad in #792
Fix handling of negative axis in np.repeat by @manopapad in #821
Fix for #720 (by @lightsighter) by @manopapad in #721
Ensure unary_func seeding is deterministic across processes by @manopapad in #825

🛠️ Improvements

Update the architectures built in conda package by @marcinz in #770
Use thrust::cuda::par_nosync if available by @magnatelee in #780
Preemptively convert to np.ndarray on NumPy fallback by @manopapad in #802
Removing all Legion references from the code by @magnatelee in #811
Remove exception throwing from RNG code by @manopapad in #815
Pin legate to a specific commit by @trxcllnt in #824
Add support for Python 3.11 by @m3vaz in #830

📖 Documentation

[WIP] Docs refresh by @bryevdv in #805

Full Changelog: v23.01.00...v23.03.00

Contributors

trxcllnt, manopapad, and 5 other contributors

Assets 2

31 Jan 03:38

marcinz

v23.01.00

2455b55

v23.01.00

This release introduces support for the put and putmask operations, adds an optimized implementation for the common case of advanced indexing using a single (possibly broadcasted) boolean array, includes more information in the tags of unary/binary operations on profiles (for easier cross-referencing with the source script), and adds some small improvements to OpenMP execution.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

Make the code compile with bounds checks by @magnatelee in #648
MatVec & MatVecMul use reduction stores, not outputs by @manopapad in #646
Set default generator based on whether ninja is available by @jjwilke in #602
Allow args to be passed by position and name in auto_convert by @manopapad in #640
Force positive values for log and sqrt tests by @jjwilke in #580
Eliminate empty kernel launch in cunumeric.unique by @magnatelee in #675
Make install.py reconfigure editable installs when build type changes by @trxcllnt in #670
Fix for #684 by @magnatelee in #686
Follow up on PR #671 by @ipdemes in #677
More argument checks for bincount by @magnatelee in #711
Fix a typo in unique.cu indexing by @manopapad in #713
guard all2all from empty transfer by @mfoerste4 in #727
src/cunumeric/item: add openmp variants for write/read tasks by @rohany in #740
Fix CI failures due to numpy 1.24 upgrade by @manopapad in #745
Fix timing for CuPy tests by @manopapad in #747
Don't turn on cuNumeric debug checks on debug-rel builds by @manopapad in #753
Move pip uninstall step before CMake is run instead of after. by @trxcllnt in #760
Force conda version of cutensor by @marcinz in #765
handle numpy 'builtins' properly for coverage by @bryevdv in #766

🚀 New Features

Implementing PUT routine by @ipdemes in #582
Implementing Putmask by @ipdemes in #667

🛠️ Improvements

Move test driver code to legate.core by @bryevdv in #627
Remove --install-dir option by @bryevdv in #656
Updates for new script-based conda env generation by @manopapad in #651
Log operator names of unary and binary operations using annotations by @magnatelee in #679
Regenerate install_info.py on every build by @trxcllnt in #705
Fixes for buffer allocations by @magnatelee in #706
Clean up the basic build instructions by @manopapad in #741
Refactor benchmarks by @manopapad in #567
Improving performance for some special cases of advanced indexing by @ipdemes in #731
Pass CMAKE_GENERATOR to scikit-build by @trxcllnt in #750
Change the default CPU architecture to haswell by @marcinz in #762

Full Changelog: v22.10.00...v23.01.00

Contributors

jjwilke, trxcllnt, and 7 other contributors

Assets 2

13 Oct 23:53

marcinz

v22.10.00

81ad156

v22.10.00

The biggest change in Release 22.10 is a new build infrastructure using CMake and scikit-build. The new build system brings several benefits including robust build dependency tracking and compliance with Python site-packages. This release includes several new search and indexing operators, fixes for several performance and correctness bugs, and provenance tracking for top-level and ndarray routines in execution profiles.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

• Argwhere and flatnonzero by @mfoerste4 in #525

added extract and place via advanced indexing by @mfoerste4 in #536
Fill diagonal by @ipdemes in #473
Single processor implementation for linalg.solve by @magnatelee in #568

🛠️ Improvements

adding support for array shape () passed as an index argument in advanced indexing by @ipdemes in #486
Refactor test driver for cpu/gpu sharding by @bryevdv in #451
Collate test output to allow workers > 1 with verbose output by @bryevdv in #507
Ensure test.py --use flag fully overrides USE_* envvars by @manopapad in #524
Enhance two integration tests by @robinw0928 in #511
Add typing to array.py by @bryevdv in #478
Update test runner for osx by @bryevdv in #529
Don't blindly trust user-supplied bincount.minlength by @manopapad in #523
Make reduced-precision cuBLAS mode opt-in by @manopapad in #519
Fix reciprocal tests for zero values and improve test value customization (#467) by @marcinz in #537
Refactor test runner to support more pinning options by @bryevdv in #535
Remove dead code ian bincount by @magnatelee in #546
Make the validation condition for random distributions lenient by @magnatelee in #550
src/cunumeric: handle high number of bins in GPU bincount by @rohany in #526
Construct NumPy arrays correctly from 0D deferred arrays backed by region fields by @magnatelee in #551
Collect test failure details at the end by @bryevdv in #556
Simplify some thunk conversion helpers by @manopapad in #553
Fix a compiler warning by @magnatelee in #555
Add option to disable CPU pinning in tests by @bryevdv in #558
Use the new mapper registration to enable detailed mapper logging by @magnatelee in #570
src/cunumeric/search: make nonzero not always allocate SYS_MEM buffers by @rohany in #572
add negative test case in test_array_split.py by @xialu00 in #545
add some test cases for test_arg_reduce.py by @xialu00 in #575
Testcase-add test cases for test_flip and test_indices by @xialu00 in #579
Refactor scalar reductions to use common execution policy by @jjwilke in #573
Sanitize k for the eye operator by @magnatelee in #586
Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #514
Enhance test_block.py and test_eye.py by @robinw0928 in #578
Testcase add test cases for test_fill.py and test_ndim.py by @xialu00 in #588
Remove run dependency on curand by @marcinz in #520
Use Legion Fills when possible by @manopapad in #604
Support building with GASNet-Ex and MPI backends by @manopapad in #610
Provenance tracking for cuNumeric operators by @magnatelee in #596
Fix tests utils to make --directory work correctly. by @robinw0928 in #592
Fix a compiler warning by @magnatelee in #594
Enhance test_diag_indices.py and test_flatten.py. by @robinw0928 in #609
cuNumeric doesn't need nested provenance tracking by @magnatelee in #617
Add RuntimeError exception to legate.time by @robinw0928 in #618
Stop instantiating min and max reduction ops for complex types by @magnatelee in #621
Mark temporary conversion outputs as linear for eager storage recycling by @magnatelee in #608
Make the negative test on fill robust across Python versions by @magnatelee in #619
Enhance mask_indices and move_axis by @robinw0928 in #622
src/cunumeric/matrix: stop including coll.h in solve_template.inl by @rohany in #620

🐛 Bug Fixes

Fix performance bugs in scalar reductions by @magnatelee in #509
Don't use internal LAPACK function names by @manopapad in #522
Bug fixes for advanced indexing by @magnatelee in #532
Handle the case where LAPACK_*potrf is a macro, not a function by @manopapad in #527
fix mypy issue w/ np methods by @bryevdv in #542
Fix buggy complex-to-bool conversions and add correctness tests for astype by @magnatelee in #549
fixing advanced indexing operation for empty arrays by @ipdemes in #504
Do not link curand by @marcinz in #541
Fixing issues with advanced_indexing_kernel by @ipdemes in #557
fixing another corner case for advanced indexing by @ipdemes in #554
Fix OSX test shard generation by @bryevdv in #563
fix error print in test_unary_ufunc by @jjwilke in #566
Add NAN handling to convert() needed for some prefix routines with integer outputs. by @rkarim2 in #502
Fixing logic for slicing by @ipdemes in #574
Fix linalg.solve when inputs are scalars by @magnatelee in #585
Allow casting in cn.dot, to match numpy's behavior by @manopapad in #598
Add linalg.solve to the cmake build by @magnatelee in #603
Invoke eye with read-write privilege, not write-discard by @manopapad in #616
Fix a bug in scalar reduction launching kernels with empty domains by @magnatelee in #606

📖 Documentation

Added note to prefix documentation for corner cases where cunumeric results can diverge from numpy by @rkarim2 in #528
updating documentation by @ipdemes in #614
Add missing docs symlink by @bryevdv in #635

Contributors

jjwilke, manopapad, and 9 other contributors

Assets 2

Uh oh!

Releases: nv-legate/cupynumeric

v24.11.01

Bug Fixes

Uh oh!

v24.11.00

New features

Improved API coverage

Memory management enhancements

Enhanced infrastructure support

Other

Known Issues

Uh oh!

v24.06.01

Uh oh!

v24.06.00

Known issues

Uh oh!

v23.11.00

What's Changed

🚀 New Features

🐛 Bug Fixes

📖 Documentation

Contributors

Uh oh!

v23.09.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

Uh oh!

v23.07.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

Uh oh!

v23.03.00

What's Changed

🐛 Bug Fixes

🛠️ Improvements

📖 Documentation

Contributors

Uh oh!

v23.01.00

What's Changed

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v22.10.00

What's Changed

🚀 New Features

🛠️ Improvements

🐛 Bug Fixes

📖 Documentation

Contributors

Uh oh!