[mlir][Vector] Fix mask unpacking in transfer op unrolling #144889

Groverkss · 2025-06-19T12:58:57Z

Mask vector is calculated before any permutations or broadcasting on the memory space, which implies that the outer most dimension of the vector may not corresspond to the outer most dimension of the mask. Transpose the mask before extracting out of it. The transpose eventually folds into the vector.extract once further unrolling takes place.

llvmbot · 2025-06-19T12:59:26Z

@llvm/pr-subscribers-mlir

Author: Kunwar Grover (Groverkss)

Changes

Mask vector is calculated before any permutations or broadcasting on the memory space, which implies that the outer most dimension of the vector may not corresspond to the outer most dimension of the mask. Transpose the mask before extracting out of it. The transpose eventually folds into the vector.extract once further unrolling takes place.

Full diff: https://github.com/llvm/llvm-project/pull/144889.diff

2 Files Affected:

(modified) mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (+13-2)
(modified) mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir (+30)

diff --git a/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp b/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
index cc5623068ab10..189bf7f619888 100644
--- a/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
+++ b/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
@@ -1208,11 +1208,22 @@ static void maybeAssignMask(OpBuilder &b, OpTy xferOp, OpTy newXferOp,
   if (xferOp.getMaskType().getRank() > 1) {
     // Unpack one dimension of the mask.
     OpBuilder::InsertionGuard guard(b);
+    Location loc = xferOp.getLoc();
     b.setInsertionPoint(newXferOp); // Insert load before newXfer.
 
+    auto expr = dyn_cast<AffineDimExpr>(
+        compressUnusedDims(xferOp.getPermutationMap()).getResult(0));
+    assert(expr && "cannot extract from dimension");
+    // Transpose dim to be the outer most dimension, so we can use
+    // vector.extract on it.
+    TypedValue<VectorType> mask = xferOp.getMask();
+    SmallVector<int64_t> perm =
+        llvm::to_vector(llvm::seq<int64_t>(mask.getType().getRank()));
+    std::swap(perm[0], perm[expr.getPosition()]);
+    mask = b.create<vector::TransposeOp>(loc, mask, perm);
+    // Extract from the transposed mask.
     llvm::SmallVector<int64_t, 1> indices({i});
-    Location loc = xferOp.getLoc();
-    auto newMask = b.create<vector::ExtractOp>(loc, xferOp.getMask(), indices);
+    auto newMask = b.create<vector::ExtractOp>(loc, mask, indices);
     newXferOp.getMaskMutable().assign(newMask);
   }
 
diff --git a/mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir b/mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir
index 7d97829c06599..8aa72086e4e0e 100644
--- a/mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir
+++ b/mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir
@@ -84,3 +84,33 @@ func.func @transfer_read_mask(%A : memref<?x?x?xf32>, %mask : vector<2x3x4xi1>)
   %vec = vector.transfer_read %A[%c0, %c0, %c0], %f0, %mask {in_bounds = [true, true, true]}: memref<?x?x?xf32>, vector<2x3x4xf32>
   return %vec : vector<2x3x4xf32>
 }
+
+// -----
+
+func.func @transfer_read_perm_mask(%A : memref<?x?x?x?xf32>, %mask : vector<3x2x4xi1>) -> (vector<2x3x4xf32>) {
+  %f0 = arith.constant 0.0: f32
+  %c0 = arith.constant 0: index
+
+  // CHECK:      vector.extract %{{.*}}[0, 0] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [0, 0] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NEXT: vector.extract %{{.*}}[1, 0] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [0, 1] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NEXT: vector.extract %{{.*}}[2, 0] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [0, 2] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NEXT: vector.extract %{{.*}}[0, 1] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [1, 0] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NEXT: vector.extract %{{.*}}[1, 1] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [1, 1] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NEXT: vector.extract %{{.*}}[2, 1] : vector<4xi1> from vector<3x2x4xi1>
+  // CHECK-NEXT: vector.transfer_read {{.*}} : memref<?x?x?x?xf32>, vector<4xf32>
+  // CHECK-NEXT: vector.insert {{.*}} [1, 2] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NOT: scf.if
+  // CHECK-NOT: scf.for
+  %vec = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0, %mask {permutation_map = affine_map<(d0, d1, d2, d4) -> (d2, d0, d4)>, in_bounds = [true, true, true]}: memref<?x?x?x?xf32>, vector<2x3x4xf32>
+  return %vec : vector<2x3x4xf32>
+}

nicolasvasilache · 2025-06-19T13:02:51Z

mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir

+  %f0 = arith.constant 0.0: f32
+  %c0 = arith.constant 0: index
+
+  // CHECK:      vector.extract %{{.*}}[0, 0] : vector<4xi1> from vector<3x2x4xi1>


can we see a little more in the test (at least the transpose on the mask) ?

I'm not sure if that's possible if unrolling completely, because the transpose will just fold with the vector.extract on further unrolling. The vector.extract indices do show that the mask is being read in a transposed fashion. The other solution is to have a test that doesn't unroll fully. Any ideas what would be prefered?

ok, I missed that the test was also doing that.
There should be a max-transfer-rank or similar parameter where one could stop the unrolling at 2-D transfer reads (e.g. for HW that support > 1-D loads) but it is likely not worth the trouble at this point.

nicolasvasilache · 2025-06-19T13:06:53Z

mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir

+  // CHECK-NEXT: vector.insert {{.*}} [1, 2] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NOT: scf.if
+  // CHECK-NOT: scf.for
+  %vec = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0, %mask {permutation_map = affine_map<(d0, d1, d2, d4) -> (d2, d0, d4)>, in_bounds = [true, true, true]}: memref<?x?x?x?xf32>, vector<2x3x4xf32>


hmm it is surprising to not have the mask type as part of the op given that the mapping is not trivial between vector<2x3x4xf32> and vector<3x2x4xi1>.
@dcaballe should the parser/printer be improved? (in a future PR)

Printing mask type would make sense to me. We discussed something similar recently:

[mlir][Vector] Infer mask and pass_thru types for maskedload/store #131482

However, there's a broader question. Do we need to support both forms:

%vec = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0, %mask

vs

%vec = vector.mask %mask { vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0 }

?

Also, @Groverkss , this is a very nice example that demonstrates a case where the shape of the mask and the output vectors are different. We miss such examples in ops.mlir and I'd be tempted to add it there. Just as a nice-to-have.

nicolasvasilache · 2025-06-19T13:12:34Z

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp

+        compressUnusedDims(xferOp.getPermutationMap()).getResult(0));
+    assert(expr && "cannot extract from dimension");
+    // Transpose dim to be the outer most dimension, so we can use
+    // vector.extract on it.


I'd rephrase a bit:

vector.extract can only extract the most minor dimensions of an multi-dimensional vector. Transpose `d0` to the most most minor dimension so we can extract the (n-1)-D submask.

banach-space

Thanks! I'm not particularly familiar with this logic, but the test makes sense.

any permutations or broadcasting on the memory space

Is broadcasting relevant here? If yes, it would be good to add a test for that.

LGTM % minor suggestions (nice-to-haves aka nits)

banach-space · 2025-06-19T15:10:45Z

mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir

+  // CHECK-NEXT: vector.insert {{.*}} [1, 2] : vector<4xf32> into vector<2x3x4xf32>
+  // CHECK-NOT: scf.if
+  // CHECK-NOT: scf.for
+  %vec = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0, %mask {permutation_map = affine_map<(d0, d1, d2, d4) -> (d2, d0, d4)>, in_bounds = [true, true, true]}: memref<?x?x?x?xf32>, vector<2x3x4xf32>


Printing mask type would make sense to me. We discussed something similar recently:

[mlir][Vector] Infer mask and pass_thru types for maskedload/store #131482

However, there's a broader question. Do we need to support both forms:

%vec = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0, %mask

vs

%vec = vector.mask %mask { vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0 }

?

Also, @Groverkss , this is a very nice example that demonstrates a case where the shape of the mask and the output vectors are different. We miss such examples in ops.mlir and I'd be tempted to add it there. Just as a nice-to-have.

banach-space · 2025-06-19T15:38:36Z

mlir/test/Conversion/VectorToSCF/unrolled-vector-to-loops.mlir

+
+// -----
+
+func.func @transfer_read_perm_mask(%A : memref<?x?x?x?xf32>, %mask : vector<3x2x4xi1>) -> (vector<2x3x4xf32>) {


IMO, this and other tests in this file are missing LIT variables that would demonstrate that e.g. %MASK_1 is used for %XFER_READ_1.

[mlir][Vector] Fix mask unpacking in transfer op unrolling

1bdb2e5

Groverkss requested review from banach-space, dcaballe, matthias-springer and nicolasvasilache as code owners June 19, 2025 12:58

llvmbot added the mlir label Jun 19, 2025

nicolasvasilache reviewed Jun 19, 2025

View reviewed changes

nicolasvasilache approved these changes Jun 19, 2025

View reviewed changes

banach-space approved these changes Jun 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][Vector] Fix mask unpacking in transfer op unrolling #144889

[mlir][Vector] Fix mask unpacking in transfer op unrolling #144889

Uh oh!

Groverkss commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

nicolasvasilache Jun 19, 2025

Uh oh!

Groverkss Jun 19, 2025

Uh oh!

nicolasvasilache Jun 19, 2025

Uh oh!

nicolasvasilache Jun 19, 2025 •

edited

Loading

Uh oh!

banach-space Jun 19, 2025

Uh oh!

nicolasvasilache Jun 19, 2025

Uh oh!

banach-space left a comment

Uh oh!

banach-space Jun 19, 2025

Uh oh!

banach-space Jun 19, 2025

Uh oh!

Uh oh!


		// -----

		func.func @transfer_read_perm_mask(%A : memref<?x?x?x?xf32>, %mask : vector<3x2x4xi1>) -> (vector<2x3x4xf32>) {

[mlir][Vector] Fix mask unpacking in transfer op unrolling #144889

Are you sure you want to change the base?

[mlir][Vector] Fix mask unpacking in transfer op unrolling #144889

Uh oh!

Conversation

Groverkss commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

nicolasvasilache Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nicolasvasilache Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nicolasvasilache Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

banach-space Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nicolasvasilache Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

banach-space Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nicolasvasilache Jun 19, 2025 •

edited

Loading