[sparse] Migrate Float8SemiSparseTensor off of AQT #3361

jcaip · 2025-11-20T18:06:35Z

This PR migrates Float8DynamicActivationFloat8SemiSparseWeighConfig off of using the AQT CutlassSemiSparseLayout subclass.

The old AQT flow can still be used by passing version=1 into the config

Testing:

pytest test/quantization/quantize_/workflows/float8/test_float8_semi_sparse_tensor.py

pytorch-bot · 2025-11-20T18:06:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3361

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit d2f51b6 with merge base 5f33595 ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https... / linux-job (gh)
RuntimeError: Command docker exec -t 8db8fceca435fd3753dabc1ab3db635db9ec8480f4a6be6fbe4bd6fadb4bc456 /exec failed with exit code 1
Run 1xL4 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
RuntimeError: Command docker exec -t b210fd451124681c810209d10dedb88571f28bec2a116e16acbface998a296ca /exec failed with exit code 1
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 0d493f25bf39de040116f3d34081ab437deeed44bd1d158c7f361c1c25adaf0d /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-11-20T18:21:50Z

@jcaip has imported this pull request. If you are a Meta employee, you can view this in D87560869.

jerryzh168 · 2025-11-21T18:48:27Z

torchao/quantization/quantize_/common/kernel_preference.py


+    """Use torchao cutlass kernel for fp8 + 2:4 sparse mm, requires building torchao with CUDA
+    """
+    SPARSE_CUTLASS = "sparse_cutlass"


my understanding is this is a new packing format, why is this a new kernel preference?

sparse_cutlass vs sparse_cusparselt/hipsparselt is something we will need for AMD support coming up next half, which sounds like kernel preference to me (decide which op to use).

But if this is more a general thing and packing_format is the more specific way to decide op dispatch I am fine with using that as well.

@jcaip , would be good to specify if the data format will be different and kernels different, or if data format is the same and kernels different.

jerryzh168 · 2025-11-21T18:49:36Z

torchao/quantization/quantize_/workflows/float8/float8_semi_sparse_tensor.py

+            kernel_choice = "sparse_cutlass"
+        elif kernel_preference == KernelPreference.SPARSE_CUTLASS:
+            # if user explicitly chose FBGEMM kernel preference, we'll also use fbgemm kernel
+            assert is_sm_at_least_90(), (
+                "Specified sparse_cutlass kernel and hardware is not >= SM 9.0 (>= H100)"
+            )
+            kernel_choice = "sparse_cutlass"


if "sparse_cutlass" is the only option, then I don't think we are dealing with a kernel preference here?

vkuzo · 2025-11-24T15:34:03Z

torchao/quantization/quantize_/workflows/float8/float8_semi_sparse_tensor.py

+from .float8_tensor import QuantizeTensorToFloat8Kwargs
+
+
+class Float8SemiSparseTensor(TorchAOBaseTensor):


is there a more descriptive name, something like Float8With2By4SparsityTensor?

vkuzo · 2025-11-24T15:34:19Z

torchao/quantization/quantize_/workflows/float8/float8_semi_sparse_tensor.py

+        dtype: Optional[torch.dtype] = None,
+    ):
+        super().__init__()
+        self.sparse_quantized_data = sparse_quantized_data


how about qdata to match other tensors

We can do sparse_dqata? but I think just qdata is a bit confusing since qdata is split between the specified values and metadata

vkuzo · 2025-11-24T15:35:55Z

torchao/quantization/quantize_/workflows/float8/float8_packing_format.py

+    """
+    Sparse packing formats for 2:4 sparsity + FP8 quantization
+    """
+    SPARSE_CUTLASS = "sparse_cutlass"


The intent is for the sparse tensor to use OPAQUE, and you can keep these formats internal to your workflow

jerryzh168 · 2025-11-24T22:15:50Z

torchao/quantization/quantize_/workflows/float8/float8_semi_sparse_tensor.py

+    SPARSE_CUTLASS = "sparse_cutlass"
+
+    """
+    SPARSE_CUSPARSELT will pack the quantized_data into a single tensor, sparse_qdata, which contains both the specified values and appends the metadata. 
+    This packing format will dispatch to `_cslt_sparse_mm`, which does not fuse per-row scaling into the matmul.
+    """
+    SPARSE_CUSPARSELT = "sparse_cusparselt"


should these belong to Float8PackingFormat? we structure these by "dtype" currently

jcaip added 3 commits November 19, 2025 10:56

wip

21d1200

migration working

ab923e0

update

d0ca9fc

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2025

update

6636766

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Nov 20, 2025

jcaip requested a review from jerryzh168 November 20, 2025 18:21

fix granularity error

47b5935

jerryzh168 reviewed Nov 21, 2025

View reviewed changes

migrate to packing format

96994db

vkuzo reviewed Nov 24, 2025

View reviewed changes

jcaip added 7 commits November 24, 2025 08:02

update

c978529

merge main

868cca7

fix ruff

9c4e421

fix typo

fac2240

update docstring

16ff85d

wip

e9fae84

make packing format specific to float8semisparsetensor

d2f51b6

jerryzh168 reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sparse] Migrate Float8SemiSparseTensor off of AQT #3361

[sparse] Migrate Float8SemiSparseTensor off of AQT #3361

Uh oh!

jcaip commented Nov 20, 2025

Uh oh!

pytorch-bot bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 20, 2025

Uh oh!

jerryzh168 Nov 21, 2025

Uh oh!

jcaip Nov 24, 2025

Uh oh!

vkuzo Nov 24, 2025

Uh oh!

jerryzh168 Nov 21, 2025

Uh oh!

vkuzo Nov 24, 2025

Uh oh!

vkuzo Nov 24, 2025

Uh oh!

jcaip Nov 24, 2025

Uh oh!

vkuzo Nov 24, 2025

Uh oh!

jerryzh168 Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		from .float8_tensor import QuantizeTensorToFloat8Kwargs


		class Float8SemiSparseTensor(TorchAOBaseTensor):

[sparse] Migrate Float8SemiSparseTensor off of AQT #3361

Are you sure you want to change the base?

[sparse] Migrate Float8SemiSparseTensor off of AQT #3361

Uh oh!

Conversation

jcaip commented Nov 20, 2025

Uh oh!

pytorch-bot bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3361

❌ 3 New Failures

Uh oh!

meta-codesync bot commented Nov 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Nov 20, 2025 •

edited

Loading