Skip to content

Commit 23e15f3

Browse files
authored
Update mat-transpose/README.md (xlite-dev#300)
1 parent acaac78 commit 23e15f3

File tree

1 file changed

+10
-14
lines changed

1 file changed

+10
-14
lines changed

kernels/mat-transpose/README.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,16 @@
1313
- [X] mat_transpose_f32x4_shared_row2col_kernel(float4向量化版本,共享内存)
1414
- [X] mat_transpose_f32x4_shared_bcf_col2row_kernel(float4向量化版本,共享内存,去bank conflict)
1515
- [X] mat_transpose_f32x4_shared_bcf_row2col_kernel(float4向量化版本,共享内存,去bank conflict)
16-
- CuTe kernel and configurations
17-
- mat_transpose_cute_reg_kernel
18-
- [X] mat_transpose_cute_row2col_reg
19-
- [X] mat_transpose_cute_col2row_reg
20-
- mat_transpose_cute_smem_kernel (smem)
21-
- [X] mat_transpose_cute_col_smem
22-
- [X] mat_transpose_cute_row_smem
23-
- [X] mat_transpose_cute_col_smem_swizzled (bank conflict free)
24-
- [X] mat_transpose_cute_row_smem_swizzled
25-
- mat_transpose_cute_smem_vectorized_kernel (float4)
26-
- [X] mat_transpose_cute_row_cvectorized
27-
- [X] mat_transpose_cute_row_cvectorized_swizzled
28-
- [X] mat_transpose_cute_row_rvectorized
29-
- [X] mat_transpose_cute_row_rvectorized_swizzled
16+
- [X] mat_transpose_cute_row2col_reg
17+
- [X] mat_transpose_cute_col2row_reg
18+
- [X] mat_transpose_cute_col_smem
19+
- [X] mat_transpose_cute_row_smem
20+
- [X] mat_transpose_cute_col_smem_swizzled (bank conflict free)
21+
- [X] mat_transpose_cute_row_smem_swizzled
22+
- [X] mat_transpose_cute_row_cvectorized
23+
- [X] mat_transpose_cute_row_cvectorized_swizzled
24+
- [X] mat_transpose_cute_row_rvectorized
25+
- [X] mat_transpose_cute_row_rvectorized_swizzled
3026
- [X] PyTorch bindings
3127

3228
虽然是基础操作但是很适合练手,比矩阵乘法难度低一点但是可以其中可以用到的优化技巧都可以想办法用到这里来。

0 commit comments

Comments
 (0)