Releases: NVIDIA/cuda-python
cuda.core v0.2.0
cuda.core
v0.2.0 release announcement
Release note
All functionalities are currently hosted under the cuda.core.experimental
namespace. Once the features become stable they will be moved out of experimental
.
Key Features and Enhancements
- Add
ProgramOptions
to facilitate the passing of runtime compile options toProgram
. - Add pythonic access to
Device
andKernel
attributes.
For full details please refer to the release note above.
Breaking Changes
- The
stream
attribute is removed fromLaunchConfig
. Instead, theStream
object should now be directly passed tolaunch
as an argument. - The signature for
launch
is changed by swapping positional arguments, the new signature is now(stream, config, kernel, *kernel_args)
- Change
__cuda_stream__
from attribute to method. - The
Program.compile
method no longer accepts theoptions
argument. Instead, you can optionally pass an instance ofProgramOptions
to the constructor ofProgram
. Device.properties
now provides attribute getters instead of a dictionary interface.- The
.handle
attribute of variouscuda.core
objects now returns the underlying Python object instead of a (type-erased) Python integer.
New examples
jit_lto_fractal.py
— Demonstrates just-in-time link-time optimization for fractal generation. (Device
,LaunchConfig
,Linker
,LinkerOptions
,Program
,ProgramOptions
) (#475)simple_multi_gpu_example.py
— Example of using multiple GPUs. (Device
,Program
,LaunchConfig
) (#304)show_device_properties.py
— Displays detailed device properties. (Device
) (#474)
Documentation
Sample codes
Test fixes
- Clean up device initialization in some tests. (#507)
What's Changed
- Add back CuPy as an optional test dependency + Fix an example bug by @leofang in #334
- add warning to the nvjitlink ctor when falling back to cuLink by @ksimpson-work in #315
- Set up a preliminary doc build/publish pipeline by @leofang in #325
- Fix doc ci permissions by @leofang in #338
- Add conda installation instructions by @leofang in #321
- Add a dummy email address to the doc bot by @leofang in #343
- multi gpu example by @amwi04 in #304
- Change
__cuda_stream__
from attribute to method by @NaderAlAwar in #389 - Ensure deprecation warnings from cuda.bindings are swallowed by @ksimpson-work in #404
- Add the options data class to program by @ksimpson-work in #237
- Update to use the new NVKS runners by @leofang in #426
- Disable notebook execution for
cuda.bindings
by @vzhurba01 in #424 - Stop tracking cached files and Jupyter notebook hashes in doc builds by @vzhurba01 in #425
- Documentation remove gpu dependency by @ksimpson-work in #398
- handle CTK version specific options in the linker test by @ksimpson-work in #371
- support ptx code type for program by @ksimpson-work in #317
- Update release checklist to focus on subpackages by @vzhurba01 in #427
- Kernel attributes by @ksimpson-work in #360
- Use cpu runner to build docs by @leofang in #434
- device properties by @ksimpson-work in #409
- Update linker options sequence handling by @ksimpson-work in #436
- Expose
ObjectCode
as public API + prune unnecessary input arguments by @ksimpson-work in #435 - Pin Sphinx <8.2.0 to fix doc build by @leofang in #456
- CI: Add Windows GPU runner for tests by @leofang in #444
- Improve program checks by @ksimpson-work in #394
- Various handle-related changes and improvements by @leofang in #463
- Improve perf of accessing
dev.compute_capability
by @leofang in #459 - Device properties example by @samaid in #474
- Add
ObjectCode
ptx constructor by @brandon-b-miller in #470 - Add a
Linker
example by @vzhurba01 in #475 - Add error log producing test by @ksimpson-work in #423
- Apply
__new__
approach to disabling__init__
by @rwgk in #484 - switch the launch argument order by @ksimpson-work in #316
- NEW: Create an Event without recording to Stream by @carterbox in #487
- Clearer error messages (cuda.core) by @rwgk in #458
- Add event timing by @leofang in #481
- Fix merge error by @leofang in #495
- add public handle to object code by @ksimpson-work in #492
- Bump
cuda-core
ver to v0.2.0 by @leofang in #494 - Increase tolerance in
test_timing()
to avoid flaky tests. by @rwgk in #498 - Fix
test_timing
flakiness under Windows by @rwgk in #508 - NEW: Add Event to public API by @carterbox in #501
- cuda.core: Change selected
.decode()
calls to.decode("utf-8", errors="backslashreplace")
by @rwgk in #510 - clean up device initialization in test by @ksimpson-work in #507
- Add
@functools.lru_cache
decorator forget_binding_version()
by @rwgk in #512 - Fix dangling pointer problem in _linker.py by @rwgk in #516
- cuda.core: release notes update by @rwgk in #519
- Set release date by @vzhurba01 in #523
New Contributors
- @amwi04 made their first contribution in #304
- @NaderAlAwar made their first contribution in #389
- @samaid made their first contribution in #474
- @brandon-b-miller made their first contribution in #470
- @carterbox made their first contribution in #487
Full Changelog: cuda-core-v0.1.1...cuda-core-v0.2.0
CUDA Python 12.8.0
CUDA Python 11.8.6
cuda.core v0.1.1
cuda.core
v0.1.1 release announcement
Release note
All functionalities are currently hosted under the cuda.core.experimental
namespace. Once the features become stable they will be moved out of experimental
.
Key Features and Enhancements
- Added
Linker
for runtime linking (using nvJitLink or driver APIs) - Added
StridedMemoryView
and@args_viewable_as_strided_memory
to arbitrary Python objects that support either DLPack or CUDA Array Interface - Support pip installation
- Public, GitHub-Action-based CI infrastructure
For full details please refer to the release note above.
Documentation
Sample codes
What's Changed
- Fix
_util.device_from_ctx
by @ksimpson-work in #203 - Update docs to reflect recent releases by @leofang in #239
- rectify the supported types for ObjectCode and Program by @ksimpson-work in #224
- Clean up testsuite by @ksimpson-work in #213
- Fix the build portion of the gh-actions by @ksimpson-work in #249
- fix some test issues caused by bad merging on browser by @ksimpson-work in #254
- Add ruff linter by @ksimpson-work in #201
- update the nvjitlink bindings test by @ksimpson-work in #228
- Systematically replace
__del__
withweakref.finalize()
by @rwgk in #246 - Add docs, tests, and samples for
StridedMemoryView
/@args_viewable_as_strided_memory
by @leofang in #247 - Convert line endings from CRLF to LF by @trxcllnt in #263
- Some CI tweaks by @leofang in #266
- Full CI support for public builds + switch to use cibuildwheel by @leofang in #267
- Add fallback memory resource for TCC devices by @ksimpson-work in #257
- Add the cuda.core.experimental.Linker class by @ksimpson-work in #229
- Add support for CI testing by @sandeepd-nv in #124
- Add the cuda.core.experimental.system singleton by @ksimpson-work in #256
- Lazy load code modules by @ksimpson-work in #269
- Bump
cuda.core
version to v0.1.1 by @leofang in #290 - Add GPU runner for linux-aarch64 by @leofang in #289
- Add
cluster
toLaunchConfig
to support thread block clusters on Hopper by @leofang in #261 - Make the link to different object types explicit for the docs by @ksimpson-work in #258
- Add an doc page for interoperatibility by @leofang in #298
- Prepare for wheel release + populate PyPI page information by @leofang in #296
- Fix
StridedMemoryView
by deferring the check for whether a capsule is versioned by @leofang in #292 - Fix test_is_done without a sync by @vzhurba01 in #305
- CI refactoring to cover more test support by @leofang in #302
- add nvjitlink to bindings documentation by @ksimpson-work in #291
cuda.core
v0.1.1 final doc touch by @leofang in #301
New Contributors
Full Changelog: cuda-core-v0.1.0...cuda-core-v0.1.1
CUDA Python 12.6.2.post1
Packaging only hot fix for issue #226
CUDA Python 11.8.5.post1
Packaging only hot fix for issue #226
cuda.core v0.1.0
Please see the release notes (and full documentation) at https://nvidia.github.io/cuda-python/cuda-core/0.1.0/.
CUDA Python 12.6.2
Hot fix for issue #215
CUDA Python 11.8.5
Hot fix for issue #215