You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Python support SM90 Epilogue Visitor Tree (EVT) on top of the C++ support released in 3.2.0.
5
+
* SM80 EVT support in C++ and Python.
6
+
* Other SM90 epilogue improvements.
7
+
* Splitting CUTLASS library into smaller units based on operation, arch and datatypes. See [1105](https://github.com/NVIDIA/cutlass/discussions/1105) for details.
8
+
* Making `tools/library/scripts` packageable - `tools/library/scripts` is now moving to `python/cutlass_library`. See the Python [README](/python/README.md) for details.
9
+
* SM90 TF32 kernel improvements for all layouts.
10
+
* SM90 rasterization direction support in the CUTLASS profiler.
*[Few channels](/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h) specialization for reduced alignment capabilities
92
102
*[Fixed channels](/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h) further specialized when channel count perfectly matches the access vector size
Copy file name to clipboardExpand all lines: README.md
+12-3
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ In addition to GEMMs, CUTLASS implements high-performance convolution via the im
43
43
44
44
# What's New in CUTLASS 3.2
45
45
46
-
CUTLASS 3.2 is an update to CUTLASS adding:
46
+
CUTLASS 3.2.0 is an update to CUTLASS adding:
47
47
- New warp-specialized persistent FP8 GEMM kernel [kernel schedules](/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp) and [mainloops](/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp) targeting Hopper architecture that achieve great performance with TMA, WGMMA, and threadblock clusters. An example showcasing [Hopper warp-specialized FP8 GEMMs](/examples/54_hopper_fp8_warp_specialized_gemm).
48
48
- New [Epilogue Visitor Tree (EVT)](/examples/49_hopper_gemm_with_collective_builder/49_collective_builder.cu) support for Hopper TMA epilogues. EVTs allows for user-defined customized epilogue fusion patterns without having to write a new epilogue.
49
49
-[Stream-K](/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp) feature for Hopper. Note that this is only a functional implementation of stream-K, and should not be used for performance comparison. Optimizations are expected in a future release.
@@ -53,6 +53,14 @@ CUTLASS 3.2 is an update to CUTLASS adding:
53
53
- New CUTLASS 2D Convolution Python interface. New [example](/examples/python/03_basic_conv2d.ipynb) here.
54
54
- Support for Windows (MSVC) builds.
55
55
56
+
CUTLASS 3.2.1 is an update to CUTLASS adding:
57
+
- Python support SM90 Epilogue Visitor Tree (EVT) on top of the C++ support released in 3.2.0.
58
+
- SM80 EVT support in C++ and Python.
59
+
- Splitting CUTLASS library into smaller units based on operation, arch and datatypes. See [1105](https://github.com/NVIDIA/cutlass/discussions/1105) for details.
60
+
- Making `tools/library/scripts` packageable - `tools/library/scripts` is now moving to `python/cutlass_library`. See the Python [README](/python/README.md) for details.
61
+
- SM90 TF32 kernel improvements for all layouts.
62
+
- SM90 rasterization direction support in the CUTLASS profiler.
63
+
- Improvement for CUTLASS profiler build times.
56
64
57
65
Minimum requirements:
58
66
@@ -176,7 +184,8 @@ CUTLASS is a header-only template library and does not need to be built to be us
176
184
projects. Client applications should target CUTLASS's `include/` directory in their include
177
185
paths.
178
186
179
-
CUTLASS unit tests, examples, and utilities can be build with CMake starting version 3.12.
187
+
CUTLASS unit tests, examples, and utilities can be build with CMake.
188
+
The minimum version of CMake is given in the [Quickstart guide](media/docs/quickstart.md).
180
189
Make sure the `CUDACXX` environment variable points to NVCC in the CUDA Toolkit installed
181
190
on your system.
182
191
@@ -512,7 +521,7 @@ reference_device: Passed
512
521
## More Details on Compiling CUTLASS Kernels and CUTLASS Profiler
513
522
- Please follow the links for more CMake examples on selectively compiling CUTLASS kernels:
0 commit comments