[VectorCombine] New folding pattern for extract/binop/shuffle chains #145232

Rajveer100 · 2025-06-22T12:14:44Z

This adds a new foldShuffleChainsToReduce for horizontal reduction of patterns like:

define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 {
  %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison>
  %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1)
  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3)
  %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5)
  %7 = extractelement <8 x i16> %6, i64 0
  ret i16 %7
}

...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call.

Similar transformation for other ops when costs permit to do so.

llvmbot · 2025-06-22T12:15:12Z

@llvm/pr-subscribers-llvm-transforms

Author: Rajveer Singh Bharadwaj (Rajveer100)

Changes

Resolves #144654
Part of #143088

This adds a new foldShuffleChainsToReduce for horizontal reduction of patterns like:

define i16 @<!-- -->test_reduce_v8i16(&lt;8 x i16&gt; %a0) local_unnamed_addr #<!-- -->0 {
  %1 = shufflevector &lt;8 x i16&gt; %a0, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %a0, &lt;8 x i16&gt; %1)
  %3 = shufflevector &lt;8 x i16&gt; %2, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %4 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %2, &lt;8 x i16&gt; %3)
  %5 = shufflevector &lt;8 x i16&gt; %4, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %6 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %4, &lt;8 x i16&gt; %5)
  %7 = extractelement &lt;8 x i16&gt; %6, i64 0
  ret i16 %7
}

...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call.

Similar transformation for other ops when costs permit to do so.

Full diff: https://github.com/llvm/llvm-project/pull/145232.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+126)
(added) llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll (+18)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 52cb1dbb33b86..aca939c4f534d 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -129,6 +129,7 @@ class VectorCombine {
   bool foldShuffleOfIntrinsics(Instruction &I);
   bool foldShuffleToIdentity(Instruction &I);
   bool foldShuffleFromReductions(Instruction &I);
+  bool foldShuffleChainsToReduce(Instruction &I);
   bool foldCastFromReductions(Instruction &I);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
   bool foldInterleaveIntrinsics(Instruction &I);
@@ -2910,6 +2911,130 @@ bool VectorCombine::foldShuffleFromReductions(Instruction &I) {
   return foldSelectShuffle(*Shuffle, true);
 }
 
+bool VectorCombine::foldShuffleChainsToReduce(Instruction &I) {
+  auto *SVI = dyn_cast<ShuffleVectorInst>(&I);
+  if (!SVI)
+    return false;
+
+  std::queue<Value *> Worklist;
+  SmallVector<Instruction *> ToEraseFromParent;
+
+  SmallVector<int> ShuffleMask;
+  bool IsShuffleOp = true;
+
+  Worklist.push(SVI);
+  SVI->getShuffleMask(ShuffleMask);
+
+  if (ShuffleMask.size() < 2)
+    return false;
+
+  Instruction *Prev0 = nullptr, *Prev1 = nullptr;
+  Instruction *LastOp = nullptr;
+
+  int MaskHalfPos = ShuffleMask.size() / 2;
+  bool IsFirst = true;
+
+  while (!Worklist.empty()) {
+    Value *V = Worklist.front();
+    Worklist.pop();
+
+    auto *CI = dyn_cast<Instruction>(V);
+    if (!CI)
+      return false;
+
+    if (auto *SV = dyn_cast<ShuffleVectorInst>(V)) {
+      if (!IsShuffleOp || MaskHalfPos < 1 || (!Prev1 && !IsFirst))
+        return false;
+
+      auto *Op0 = SV->getOperand(0);
+      auto *Op1 = SV->getOperand(1);
+      if (!Op0 || !Op1)
+        return false;
+
+      auto *FVT = dyn_cast<FixedVectorType>(Op1->getType());
+      if (!FVT || !isa<PoisonValue>(Op1))
+        return false;
+
+      SmallVector<int> CurrentMask;
+      SV->getShuffleMask(CurrentMask);
+
+      int64_t MaskSize = CurrentMask.size();
+      for (int MaskPos = 0; MaskPos != MaskSize; ++MaskPos) {
+        if (MaskPos < MaskHalfPos && CurrentMask[MaskPos] != MaskHalfPos + MaskPos)
+          return false;
+        if (MaskPos >= MaskHalfPos && CurrentMask[MaskPos] != -1)
+          return false;
+      }
+      MaskHalfPos /= 2;
+      Prev0 = SV;
+    } else if (auto *Call = dyn_cast<CallInst>(V)) {
+      if (IsShuffleOp || !Prev0)
+        return false;
+
+      auto *II = dyn_cast<IntrinsicInst>(Call);
+      if (!II)
+        return false;
+
+      switch (II->getIntrinsicID()) {
+      case Intrinsic::umin: {
+        auto *Op0 = Call->getOperand(0);
+        auto *Op1 = Call->getOperand(1);
+        if (!(Op0 == Prev0 && Op1 == Prev1) && !(Op0 == Prev1 && Op1 == Prev0) && !IsFirst)
+          return false;
+
+        if (!IsFirst)
+          Prev0 = Prev1;
+        else
+         IsFirst = false;
+        Prev1 = Call;
+        break;
+      }
+      default:
+        return false;
+      }
+    } else if (auto *ExtractElement = dyn_cast<ExtractElementInst>(CI)) {
+      if (!IsShuffleOp || !Prev0 || !Prev1 || MaskHalfPos != 0)
+        return false;
+
+      auto *Op0 = ExtractElement->getOperand(0);
+      auto *Op1 = ExtractElement->getOperand(1);
+      if (Op0 != Prev1)
+        return false;
+
+      if (auto *Op1Idx = dyn_cast<ConstantInt>(Op1)) {
+        if (Op1Idx->getValue() != 0)
+          return false;
+      } else {
+        return false;
+      }
+      LastOp = ExtractElement;
+      break;
+    }
+    IsShuffleOp ^= 1;
+    ToEraseFromParent.push_back(CI);
+
+    auto *NextI = CI->getNextNode();
+    if (!NextI)
+      return false;
+    Worklist.push(NextI);
+  }
+
+  if (!LastOp)
+    return false;
+
+  auto *ReducedResult = Builder.CreateIntrinsic(Intrinsic::vector_reduce_umin, {SVI->getType()}, {SVI->getOperand(0)});
+  replaceValue(*LastOp, *ReducedResult);
+
+  ToEraseFromParent.push_back(LastOp);
+
+  std::reverse(ToEraseFromParent.begin(), ToEraseFromParent.end());
+  // for (auto &Instr : ToEraseFromParent)
+    // eraseInstruction(*Instr);
+    // Instr->eraseFromParent();
+
+  return true;
+}
+
 /// Determine if its more efficient to fold:
 ///   reduce(trunc(x)) -> trunc(reduce(x)).
 ///   reduce(sext(x))  -> sext(reduce(x)).
@@ -3607,6 +3732,7 @@ bool VectorCombine::run() {
         MadeChange |= foldShuffleOfIntrinsics(I);
         MadeChange |= foldSelectShuffle(I);
         MadeChange |= foldShuffleToIdentity(I);
+        MadeChange |= foldShuffleChainsToReduce(I);
         break;
       case Instruction::BitCast:
         MadeChange |= foldBitcastShuffle(I);
diff --git a/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll b/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
new file mode 100644
index 0000000000000..6f21eb5097fde
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
@@ -0,0 +1,18 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=vector-combine -S | FileCheck %s
+
+define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 {
+; CHECK-LABEL: define i16 @test_reduce_v8i16(
+; CHECK-SAME: <8 x i16> [[A0:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i16 @llvm.vector.reduce.umin.v8i16(<8 x i16> [[A0]])
+; CHECK-NEXT:    ret i16 [[TMP1]]
+;
+  %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison>
+  %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1)
+  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+  %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3)
+  %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+  %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5)
+  %7 = extractelement <8 x i16> %6, i64 0
+  ret i16 %7
+}

llvmbot · 2025-06-22T12:15:12Z

@llvm/pr-subscribers-vectorizers

Author: Rajveer Singh Bharadwaj (Rajveer100)

Changes

Resolves #144654
Part of #143088

This adds a new foldShuffleChainsToReduce for horizontal reduction of patterns like:

define i16 @<!-- -->test_reduce_v8i16(&lt;8 x i16&gt; %a0) local_unnamed_addr #<!-- -->0 {
  %1 = shufflevector &lt;8 x i16&gt; %a0, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %a0, &lt;8 x i16&gt; %1)
  %3 = shufflevector &lt;8 x i16&gt; %2, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %4 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %2, &lt;8 x i16&gt; %3)
  %5 = shufflevector &lt;8 x i16&gt; %4, &lt;8 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison&gt;
  %6 = tail call &lt;8 x i16&gt; @<!-- -->llvm.umin.v8i16(&lt;8 x i16&gt; %4, &lt;8 x i16&gt; %5)
  %7 = extractelement &lt;8 x i16&gt; %6, i64 0
  ret i16 %7
}

...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call.

Similar transformation for other ops when costs permit to do so.

Full diff: https://github.com/llvm/llvm-project/pull/145232.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+126)
(added) llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll (+18)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 52cb1dbb33b86..aca939c4f534d 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -129,6 +129,7 @@ class VectorCombine {
   bool foldShuffleOfIntrinsics(Instruction &I);
   bool foldShuffleToIdentity(Instruction &I);
   bool foldShuffleFromReductions(Instruction &I);
+  bool foldShuffleChainsToReduce(Instruction &I);
   bool foldCastFromReductions(Instruction &I);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
   bool foldInterleaveIntrinsics(Instruction &I);
@@ -2910,6 +2911,130 @@ bool VectorCombine::foldShuffleFromReductions(Instruction &I) {
   return foldSelectShuffle(*Shuffle, true);
 }
 
+bool VectorCombine::foldShuffleChainsToReduce(Instruction &I) {
+  auto *SVI = dyn_cast<ShuffleVectorInst>(&I);
+  if (!SVI)
+    return false;
+
+  std::queue<Value *> Worklist;
+  SmallVector<Instruction *> ToEraseFromParent;
+
+  SmallVector<int> ShuffleMask;
+  bool IsShuffleOp = true;
+
+  Worklist.push(SVI);
+  SVI->getShuffleMask(ShuffleMask);
+
+  if (ShuffleMask.size() < 2)
+    return false;
+
+  Instruction *Prev0 = nullptr, *Prev1 = nullptr;
+  Instruction *LastOp = nullptr;
+
+  int MaskHalfPos = ShuffleMask.size() / 2;
+  bool IsFirst = true;
+
+  while (!Worklist.empty()) {
+    Value *V = Worklist.front();
+    Worklist.pop();
+
+    auto *CI = dyn_cast<Instruction>(V);
+    if (!CI)
+      return false;
+
+    if (auto *SV = dyn_cast<ShuffleVectorInst>(V)) {
+      if (!IsShuffleOp || MaskHalfPos < 1 || (!Prev1 && !IsFirst))
+        return false;
+
+      auto *Op0 = SV->getOperand(0);
+      auto *Op1 = SV->getOperand(1);
+      if (!Op0 || !Op1)
+        return false;
+
+      auto *FVT = dyn_cast<FixedVectorType>(Op1->getType());
+      if (!FVT || !isa<PoisonValue>(Op1))
+        return false;
+
+      SmallVector<int> CurrentMask;
+      SV->getShuffleMask(CurrentMask);
+
+      int64_t MaskSize = CurrentMask.size();
+      for (int MaskPos = 0; MaskPos != MaskSize; ++MaskPos) {
+        if (MaskPos < MaskHalfPos && CurrentMask[MaskPos] != MaskHalfPos + MaskPos)
+          return false;
+        if (MaskPos >= MaskHalfPos && CurrentMask[MaskPos] != -1)
+          return false;
+      }
+      MaskHalfPos /= 2;
+      Prev0 = SV;
+    } else if (auto *Call = dyn_cast<CallInst>(V)) {
+      if (IsShuffleOp || !Prev0)
+        return false;
+
+      auto *II = dyn_cast<IntrinsicInst>(Call);
+      if (!II)
+        return false;
+
+      switch (II->getIntrinsicID()) {
+      case Intrinsic::umin: {
+        auto *Op0 = Call->getOperand(0);
+        auto *Op1 = Call->getOperand(1);
+        if (!(Op0 == Prev0 && Op1 == Prev1) && !(Op0 == Prev1 && Op1 == Prev0) && !IsFirst)
+          return false;
+
+        if (!IsFirst)
+          Prev0 = Prev1;
+        else
+         IsFirst = false;
+        Prev1 = Call;
+        break;
+      }
+      default:
+        return false;
+      }
+    } else if (auto *ExtractElement = dyn_cast<ExtractElementInst>(CI)) {
+      if (!IsShuffleOp || !Prev0 || !Prev1 || MaskHalfPos != 0)
+        return false;
+
+      auto *Op0 = ExtractElement->getOperand(0);
+      auto *Op1 = ExtractElement->getOperand(1);
+      if (Op0 != Prev1)
+        return false;
+
+      if (auto *Op1Idx = dyn_cast<ConstantInt>(Op1)) {
+        if (Op1Idx->getValue() != 0)
+          return false;
+      } else {
+        return false;
+      }
+      LastOp = ExtractElement;
+      break;
+    }
+    IsShuffleOp ^= 1;
+    ToEraseFromParent.push_back(CI);
+
+    auto *NextI = CI->getNextNode();
+    if (!NextI)
+      return false;
+    Worklist.push(NextI);
+  }
+
+  if (!LastOp)
+    return false;
+
+  auto *ReducedResult = Builder.CreateIntrinsic(Intrinsic::vector_reduce_umin, {SVI->getType()}, {SVI->getOperand(0)});
+  replaceValue(*LastOp, *ReducedResult);
+
+  ToEraseFromParent.push_back(LastOp);
+
+  std::reverse(ToEraseFromParent.begin(), ToEraseFromParent.end());
+  // for (auto &Instr : ToEraseFromParent)
+    // eraseInstruction(*Instr);
+    // Instr->eraseFromParent();
+
+  return true;
+}
+
 /// Determine if its more efficient to fold:
 ///   reduce(trunc(x)) -> trunc(reduce(x)).
 ///   reduce(sext(x))  -> sext(reduce(x)).
@@ -3607,6 +3732,7 @@ bool VectorCombine::run() {
         MadeChange |= foldShuffleOfIntrinsics(I);
         MadeChange |= foldSelectShuffle(I);
         MadeChange |= foldShuffleToIdentity(I);
+        MadeChange |= foldShuffleChainsToReduce(I);
         break;
       case Instruction::BitCast:
         MadeChange |= foldBitcastShuffle(I);
diff --git a/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll b/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
new file mode 100644
index 0000000000000..6f21eb5097fde
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
@@ -0,0 +1,18 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=vector-combine -S | FileCheck %s
+
+define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 {
+; CHECK-LABEL: define i16 @test_reduce_v8i16(
+; CHECK-SAME: <8 x i16> [[A0:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i16 @llvm.vector.reduce.umin.v8i16(<8 x i16> [[A0]])
+; CHECK-NEXT:    ret i16 [[TMP1]]
+;
+  %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison>
+  %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1)
+  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+  %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3)
+  %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+  %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5)
+  %7 = extractelement <8 x i16> %6, i64 0
+  ret i16 %7
+}

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll

Rajveer100 · 2025-06-24T12:10:05Z

I have updated the implementation itself, re-checking for other potential issues and adding cost analysis/tests.

Rajveer100 · 2025-06-26T11:10:01Z

@RKSimon
Let me know if this looking fair enough, I can then proceed with other remaining ops like add/mul/and/or/xor.

RKSimon · 2025-06-26T13:46:46Z

@Rajveer100 Please can you investigate the CI failures?

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-umin.ll

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll

Resolves llvm#144654 Part of llvm#143088 This adds a new `foldShuffleChainsToReduce` for horizontal reduction of patterns like: ```llvm define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 { %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison> %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1) %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3) %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5) %7 = extractelement <8 x i16> %6, i64 0 ret i16 %7 } ``` ...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call. Similar transformation for other ops when costs permit to do so.

github-actions · 2025-06-28T11:04:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Rajveer100 · 2025-06-28T11:04:26Z

Added negative tests and support for other binary operations as well. Let me know if anything else is needed.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

dtcxzyw · 2025-06-30T08:58:06Z

@RKSimon Do you know where this pattern is generated from? If it is only introduced by SLP/LoopVec, can we just emit a reduction instead? Is this pattern left as is since the reduction is not supported by the target?

RKSimon · 2025-06-30T10:41:54Z

@RKSimon Do you know where this pattern is generated from? If it is only introduced by SLP/LoopVec, can we just emit a reduction instead? Is this pattern left as is since the reduction is not supported by the target?

Its a standard SIMD pattern for C/C++ SSE/AVX code for reductions where the builtin isn't available - although it can get more complicated if wider vector types get split down the chain (1 x 512-bit -> 2 x 256-bit, 1 x 256-bit -> 2 x 128-bit etc.). Until recently the AVX512 mm512_reduce intrinsics were implemented with a similar pattern, but they now mostly use the builtins.

Rajveer100 · 2025-07-04T11:31:47Z

With the latest changes, the logic works for all element sizes and not limited to just powers of 2. Also updated the costs.

Rajveer100 · 2025-07-04T12:23:23Z

CI failures are related to flang.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Rajveer100 · 2025-07-05T10:56:34Z

I have added documentation as well.

Rajveer100 · 2025-07-08T08:21:50Z

@RKSimon @dtcxzyw
Gentle ping!

RKSimon

This still feels like its a lot more complicated than necessary - e.g. if we compare to SelectionDAG::matchBinOpReduction

Rajveer100 · 2025-07-09T10:35:22Z

Do you mean just by logic perspective or overall length?

Also, I think SelectionDAG::matchBinOpReduction handles only for powers of two. Another difference is, here we can have either an intrinsic instruction / bin op.

In terms of performance, although I haven't tested, my version may be faster although more complex.

Rajveer100 · 2025-07-15T13:15:14Z

@RKSimon
Any suggestions from your end? From my point of view, the only change I can see is trying to merge the intrinsic/call instructions in one if block for neatness perspective and maybe making smaller helper functions. But other than that the logic is pretty much straight forward.

Rajveer100 requested a review from RKSimon June 22, 2025 12:14

llvmbot added vectorizers llvm:transforms labels Jun 22, 2025

Rajveer100 force-pushed the instcombine-opt branch from 20aa7b1 to fea1941 Compare June 22, 2025 12:16

Rajveer100 requested a review from dtcxzyw June 22, 2025 12:32

RKSimon reviewed Jun 22, 2025

View reviewed changes

RKSimon reviewed Jun 23, 2025

View reviewed changes

llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll Show resolved Hide resolved

llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll Show resolved Hide resolved

Rajveer100 force-pushed the instcombine-opt branch from fea1941 to b184ba5 Compare June 24, 2025 12:04

Rajveer100 force-pushed the instcombine-opt branch 2 times, most recently from d130a70 to 1da772b Compare June 26, 2025 11:01

Rajveer100 requested a review from RKSimon June 26, 2025 11:06

RKSimon added the llvm::vectorcombine Cost-based vector combine pass label Jun 26, 2025

Rajveer100 commented Jun 26, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

RKSimon reviewed Jun 26, 2025

View reviewed changes

llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-umin.ll Outdated Show resolved Hide resolved

RKSimon reviewed Jun 26, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

Rajveer100 force-pushed the instcombine-opt branch from 1da772b to dea9e0b Compare June 27, 2025 13:38

llvmbot added the llvm:vectorcombine label Jun 27, 2025

Rajveer100 requested a review from RKSimon June 27, 2025 13:38

RKSimon reviewed Jun 27, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

RKSimon reviewed Jun 27, 2025

View reviewed changes

llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll Show resolved Hide resolved

Rajveer100 force-pushed the instcombine-opt branch from dea9e0b to 0116da2 Compare June 27, 2025 15:25

Rajveer100 force-pushed the instcombine-opt branch from 0116da2 to 44a3268 Compare June 28, 2025 08:26

Rajveer100 requested a review from RKSimon June 28, 2025 10:48

Rajveer100 force-pushed the instcombine-opt branch from b24a5b8 to 61d835b Compare June 28, 2025 11:04

Rajveer100 force-pushed the instcombine-opt branch from 61d835b to 09adb45 Compare June 28, 2025 11:59

dtcxzyw mentioned this pull request Jun 30, 2025

Fuzz PR145232 dtcxzyw/llvm-fuzz-service#92

Closed

dtcxzyw reviewed Jun 30, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

dtcxzyw reviewed Jun 30, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

Rajveer100 force-pushed the instcombine-opt branch from 09adb45 to 8932db7 Compare July 4, 2025 11:29

Rajveer100 requested a review from dtcxzyw July 4, 2025 11:31

dtcxzyw reviewed Jul 5, 2025

View reviewed changes

dtcxzyw mentioned this pull request Jul 5, 2025

Fuzz PR145232 dtcxzyw/llvm-mutation-based-fuzz-service#69

Closed

Include support for Add/Mul/Or/And/Xor Binary Operations

eb9570d

Rajveer100 force-pushed the instcombine-opt branch from 8932db7 to eb9570d Compare July 5, 2025 10:55

Rajveer100 requested a review from dtcxzyw July 5, 2025 10:55

RKSimon reviewed Jul 9, 2025

View reviewed changes

[VectorCombine] New folding pattern for extract/binop/shuffle chains #145232

Are you sure you want to change the base?

[VectorCombine] New folding pattern for extract/binop/shuffle chains #145232

Conversation

Rajveer100 commented Jun 22, 2025

Uh oh!

llvmbot commented Jun 22, 2025

Uh oh!

llvmbot commented Jun 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rajveer100 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rajveer100 commented Jun 26, 2025

Uh oh!

RKSimon commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rajveer100 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dtcxzyw commented Jun 30, 2025

Uh oh!

RKSimon commented Jun 30, 2025

Uh oh!

Rajveer100 commented Jul 4, 2025

Uh oh!

Rajveer100 commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rajveer100 commented Jul 5, 2025

Uh oh!

Rajveer100 commented Jul 8, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Rajveer100 commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rajveer100 commented Jul 15, 2025

Uh oh!

Uh oh!

Rajveer100 commented Jun 24, 2025 •

edited

Loading

github-actions bot commented Jun 28, 2025 •

edited

Loading

Rajveer100 commented Jun 28, 2025 •

edited

Loading

Rajveer100 commented Jul 9, 2025 •

edited

Loading