Releases: openucx/ucx
v1.4.0 RC2
Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
and INADDR_ANY - Added support for bitwise atomics operations
Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
- Segfault fix for a code generated by armclang compiler
- UCP memory-domain index fix for zero-copy active messages
Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
Known issues:
- #2919 - Segfault in CUDA support when KNEM not present and CMA is active
intra-node RMA transpor. As a workaround user can disable CMA support at
compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.
v1.4.0 RC1
Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
and INADDR_ANY - Added support for bitwise atomics operations
Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
Known issues:
- #2919 - Segfault in CUDA support when KNEM not present and CMA is active intra-node RMA transpor. As a workaround user can disable CMA support at compile time:
--disable-cma
. Alternatively user can remove CMA from UCX_TLS list, for example:UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy
.
v1.3.1
Bugfixes:
- Prevent potential out-of-order sending in shared memory active messages
- CUDA: Include cudamem.h in source tarball, pass cudaFree memory size
- Registration cache: fix large range lookup, handle shmat(REMAP)/mmap(FIXED)
- Limit IB CQE size for specific ARM boards
- RPM: explicitly set gcc-c++ as requirement
v1.3.0
Features:
- Added stream-based communication API to UCP
- Added support for GPU platforms: Nvidia CUDA and AMD ROCM software stacks
- Added API for client/server based connection establishment
- Added support for TCP transport
- Support for InfiniBand tag-matching offload for DC and accelerated transports
- Multi-rail support for eager and rendezvous protocols
- Added support for tag-matching communications with CUDA buffers
- Added ucp_rkey_ptr() to obtain pointer for shared memory region
- Avoid progress overhead on unused transports
- Improved scalability of software tag-matching by using a hash table
- Added transparent huge-pages allocator
- Added non-blocking flush and disconnect for UCP
- Support fixed-address memory allocation via ucp_mem_map()
- Added ucp_tag_send_nbr() API to avoid send request allocation
- Support global addressing in all IB transports
- Add support for external epoll fd and edge-triggered events
- Added registration cache for knem
- Initial support for Java bindings
Bugfixes:
- Multiple bugfixes (full list on githib)
Bugfixes since RC1:
- Fix flow control for DC transport
- Fix compilation issue with mlx5 on ARM
- Disable GDR-copy when ODP is used
- Fixes for gcc8 compilation
- Fix missing initialization of rndv_send_nbr thresholds
- Fix mlx5 srq cleanup
- Fix ep info print when there is no wireup lane
- Optimize ugni locking
Tested configurations:
- InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
Known issues:
#2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
#2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
#1977 - failure in shm/test_ucp_rma.blocking_small/0
#1926 - Timeout in mpi_test_suite with HW TM
v1.3.0 RC4
Changelog:
- Fixes for gcc8 compilation
- Fix missing initialization of rndv_send_nbr thresholds
- Fix mlx5 srq cleanup
- Fix ep info print when there is no wireup lane
- Optimize ugni locking
v1.3.0 RC3
- Fix compilation issue with mlx5 on ARM
- Disable GDR-copy when ODP is used
v1.3.0 RC2
Bugfixes:
- Fix flow control for DC transport
1.3.0 - RC1
Features:
- Added stream-based communication API to UCP
- Added support for GPU platforms: Nvidia CUDA and AMD ROCM software stacks
- Added API for client/server based connection establishment
- Added support for TCP transport
- Support for InfiniBand tag-matching offload for DC and accelerated transports
- Multi-rail support for eager and rendezvous protocols
- Added support for tag-matching communications with CUDA buffers
- Added ucp_rkey_ptr() to obtain pointer for shared memory region
- Avoid progress overhead on unused transports
- Improved scalability of software tag-matching by using a hash table
- Added transparent huge-pages allocator
- Added non-blocking flush and disconnect for UCP
- Support fixed-address memory allocation via ucp_mem_map()
- Added ucp_tag_send_nbr() API to avoid send request allocation
- Support global addressing in all IB transports
- Add support for external epoll fd and edge-triggered events
- Added registration cache for knem
- Initial support for Java bindings
Bugfixes:
- Multiple bugfixes (full list on githib)
Tested configurations: - InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
Known issues:
#2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
#2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
#1977 - failure in shm/test_ucp_rma.blocking_small/0
#1926 - Timeout in mpi_test_suite with HW TM
#1920 - transport retry count exceeded in many-to-one tests
#1689 - Segmentation fault on memory hooks test in jenkins
v1.2.2
Main:
- Support including UCX API headers from C++ code
- UD transport to handle unicast flood on RoCE fabric
- Compilation fixes for gcc 7.1.1, clang 3.6, clang 5
Details:
- When UD transport is used with RoCE, packets intended for other peers may
arrive on different adapters (as a result of unicast flooding). - This change adds packet filtering based on destination GIDs. Now the packet
is silently dropped, if its destination GID does not match the local GID. - Added a new device ID for InfiniBand HCA
- [packaging] Move
examples/
andperftest/
into doc - [packaging] Update spec to work on old distros while complaint with Fedora
guidelines - [cleanup] Removed unused ptmalloc version (2.83)
- [cleanup] Fixup license headers
v1.2.2 RC1
Main:
- Support including UCX API headers from C++ code
- UD transport to handle unicast flood on RoCE fabric
- Compilation fixes for gcc 7.1.1 and clang 3.6
Details:
- When UD transport is used with RoCE, packets intended for other peers may
arrive on different adapters (as a result of unicast flooding). - This change adds packet filtering based on destination GIDs. Now the packet
is silently dropped, if its destination GID does not match the local GID. - [packaging] Move
examples/
andperftest/
into doc - [packaging] Update spec to work on old distros while complaint with Fedora
guidelines - [cleanup] Removed unused ptmalloc version (2.83)
- [cleanup] Fixup license headers