Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions csrc/gemm_groupwise_sm120.cu
Original file line number Diff line number Diff line change
Expand Up @@ -89,11 +89,6 @@ void CutlassGemmGroupwiseScaledSM120(TensorView float_workspace_buffer, TensorVi
cudaSetDevice(float_workspace_buffer->device.device_id);
auto stream = get_stream(C->device);

// Ensure scales are contiguous
// Note: We keep the original shape and let the kernel's layout handle interpretation
CHECK_CONTIGUOUS(SFA);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any assumptions on the layout of SFA or SFB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the contiguity is not required

I don't understand, if we allow non-contiguous SFA/SFB, at least we should pass the strides from tensors to kernels but I didn't notice the logic here.

CHECK_CONTIGUOUS(SFB);

DISPATCH_SCALE_MAJOR_K(scale_major_mode, SCALE_MAJOR_K, [&] {
return DISPATCH_DLPACK_INPUT_OUTPUT_DTYPE(A->dtype, C->dtype, c_type_in, c_type_out, [&] {
return DISPATCH_SCALE_GRANULARITY(
Expand Down