Skip to content

Conversation

@jcaip
Copy link
Contributor

@jcaip jcaip commented Nov 20, 2025

This PR migrates Float8DynamicActivationFloat8SemiSparseWeighConfig off of using the AQT CutlassSemiSparseLayout subclass.

The old AQT flow can still be used by passing version=1 into the config

Testing:

pytest test/quantization/quantize_/workflows/float8/test_float8_semi_sparse_tensor.py

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3361

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit d2f51b6 with merge base 5f33595 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2025
@jcaip jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Nov 20, 2025
@jcaip jcaip requested a review from jerryzh168 November 20, 2025 18:21
@meta-codesync
Copy link

meta-codesync bot commented Nov 20, 2025

@jcaip has imported this pull request. If you are a Meta employee, you can view this in D87560869.


"""Use torchao cutlass kernel for fp8 + 2:4 sparse mm, requires building torchao with CUDA
"""
SPARSE_CUTLASS = "sparse_cutlass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is this is a new packing format, why is this a new kernel preference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse_cutlass vs sparse_cusparselt/hipsparselt is something we will need for AMD support coming up next half, which sounds like kernel preference to me (decide which op to use).

But if this is more a general thing and packing_format is the more specific way to decide op dispatch I am fine with using that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcaip , would be good to specify if the data format will be different and kernels different, or if data format is the same and kernels different.

Comment on lines 172 to 178
kernel_choice = "sparse_cutlass"
elif kernel_preference == KernelPreference.SPARSE_CUTLASS:
# if user explicitly chose FBGEMM kernel preference, we'll also use fbgemm kernel
assert is_sm_at_least_90(), (
"Specified sparse_cutlass kernel and hardware is not >= SM 9.0 (>= H100)"
)
kernel_choice = "sparse_cutlass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "sparse_cutlass" is the only option, then I don't think we are dealing with a kernel preference here?

from .float8_tensor import QuantizeTensorToFloat8Kwargs


class Float8SemiSparseTensor(TorchAOBaseTensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a more descriptive name, something like Float8With2By4SparsityTensor?

dtype: Optional[torch.dtype] = None,
):
super().__init__()
self.sparse_quantized_data = sparse_quantized_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about qdata to match other tensors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do sparse_dqata? but I think just qdata is a bit confusing since qdata is split between the specified values and metadata

"""
Sparse packing formats for 2:4 sparsity + FP8 quantization
"""
SPARSE_CUTLASS = "sparse_cutlass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is for the sparse tensor to use OPAQUE, and you can keep these formats internal to your workflow

Comment on lines +57 to +63
SPARSE_CUTLASS = "sparse_cutlass"

"""
SPARSE_CUSPARSELT will pack the quantized_data into a single tensor, sparse_qdata, which contains both the specified values and appends the metadata.
This packing format will dispatch to `_cslt_sparse_mm`, which does not fuse per-row scaling into the matmul.
"""
SPARSE_CUSPARSELT = "sparse_cusparselt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these belong to Float8PackingFormat? we structure these by "dtype" currently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants