Skip to content

[RISCV] Handle more cases when combining (vfmv.s.f (extract_subvector X, 0)) #154175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mshockwave
Copy link
Member

Previously, we fold (vfmv.s.f (extract_subvector X, 0)) into X when X's type is the same as vfmv.s.f's result type. This patch generalizes it by folding it into insert_subvector when X is narrower and extract_subvector when X is wider.


I haven't seen a similar pattern for vmv.s.x. Probably because vfmv.s.f in this case is usually part of the materialization of floating point llvm.vector.reduce.*'s start value, which is absent from their integer counterparts.

@llvmbot
Copy link
Member

llvmbot commented Aug 18, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

Previously, we fold (vfmv.s.f (extract_subvector X, 0)) into X when X's type is the same as vfmv.s.f's result type. This patch generalizes it by folding it into insert_subvector when X is narrower and extract_subvector when X is wider.


I haven't seen a similar pattern for vmv.s.x. Probably because vfmv.s.f in this case is usually part of the materialization of floating point llvm.vector.reduce.*'s start value, which is absent from their integer counterparts.


Full diff: https://github.com/llvm/llvm-project/pull/154175.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+13-3)
  • (added) llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll (+56)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index ce03818b49502..72069c547c50e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20738,12 +20738,22 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
         isNullConstant(Src.getOperand(1)) &&
         Src.getOperand(0).getValueType().isScalableVector()) {
       EVT VT = N->getValueType(0);
-      EVT SrcVT = Src.getOperand(0).getValueType();
+      SDValue EVSrc = Src.getOperand(0);
+      EVT SrcVT = EVSrc.getValueType();
       assert(SrcVT.getVectorElementType() == VT.getVectorElementType());
       // Widths match, just return the original vector.
       if (SrcVT == VT)
-        return Src.getOperand(0);
-      // TODO: Use insert_subvector/extract_subvector to change widen/narrow?
+        return EVSrc;
+      SDLoc DL(N);
+      // Width is narrower, using insert_subvector.
+      if (SrcVT.getVectorMinNumElements() < VT.getVectorMinNumElements()) {
+        return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, DAG.getUNDEF(VT),
+                           EVSrc,
+                           DAG.getConstant(0, DL, Subtarget.getXLenVT()));
+      }
+      // Width is wider, using extract_subvector.
+      return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, EVSrc,
+                         DAG.getConstant(0, DL, Subtarget.getXLenVT()));
     }
     [[fallthrough]];
   }
diff --git a/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
new file mode 100644
index 0000000000000..34bce99a101bb
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
@@ -0,0 +1,56 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr='+v,+zvl512b' < %s | FileCheck %s
+
+define <2 x float> @redundant_vfmv(ptr %p0, ptr %p1, i64 %N) {
+; CHECK-LABEL: redundant_vfmv:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    li a3, 0
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    li a4, 64
+; CHECK-NEXT:  .LBB0_1: # %body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vsetvli zero, a4, e32, m4, ta, ma
+; CHECK-NEXT:    vle32.v v12, (a0)
+; CHECK-NEXT:    vle32.v v16, (a1)
+; CHECK-NEXT:    vfredusum.vs v9, v12, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    addi a3, a3, 64
+; CHECK-NEXT:    addi a1, a1, 256
+; CHECK-NEXT:    vsetvli zero, a4, e32, m4, ta, ma
+; CHECK-NEXT:    vfredusum.vs v8, v16, v8
+; CHECK-NEXT:    vfmv.f.s fa5, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vrgather.vi v8, v9, 0
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT:    addi a0, a0, 256
+; CHECK-NEXT:    bltu a3, a2, .LBB0_1
+; CHECK-NEXT:  # %bb.2: # %exit
+; CHECK-NEXT:    ret
+entry:
+  br label %body
+
+body:
+  %52 = phi <2 x float> [ zeroinitializer, %entry ], [ %251, %body ]
+  %indvar = phi i64 [0, %entry], [%indvar.next, %body]
+  %ptr0 = getelementptr float, ptr %p0, i64 %indvar
+  %ptr1 = getelementptr float, ptr %p1, i64 %indvar
+
+  %238 = load <64 x float>, ptr %ptr0
+  %239 = load <64 x float>, ptr %ptr1
+
+  %246 = extractelement <2 x float> %52, i64 0
+  %247 = tail call reassoc float @llvm.vector.reduce.fadd.v64f32(float %246, <64 x float> %238)
+  %248 = insertelement <2 x float> poison, float %247, i64 0
+  %249 = extractelement <2 x float> %52, i64 1
+  %250 = tail call reassoc float @llvm.vector.reduce.fadd.v64f32(float %249, <64 x float> %239)
+  %251 = insertelement <2 x float> %248, float %250, i64 1
+
+  %indvar.next = add nuw i64 %indvar, 64
+  %c = icmp uge i64 %indvar.next, %N
+  br i1 %c, label %exit, label %body
+
+exit:
+  ret <2 x float> %251
+}

br label %body

body:
%52 = phi <2 x float> [ zeroinitializer, %entry ], [ %251, %body ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a loop to exercise this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't need it. The test is now updated.

@mshockwave mshockwave force-pushed the patch/rvv/redundant-vmv_s_f-elimination branch from a3f1966 to 3373776 Compare August 18, 2025 20:05
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but since I wrote the code in our downstream we should probably wait for another reviewer.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -20738,12 +20738,22 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
isNullConstant(Src.getOperand(1)) &&
Src.getOperand(0).getValueType().isScalableVector()) {
EVT VT = N->getValueType(0);
EVT SrcVT = Src.getOperand(0).getValueType();
SDValue EVSrc = Src.getOperand(0);
EVT SrcVT = EVSrc.getValueType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to EVSrcVT to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

; CHECK-NEXT: vfmv.f.s fa5, v8
; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
; CHECK-NEXT: vrgather.vi v8, v9, 0
; CHECK-NEXT: vfslide1down.vf v8, v8, fa5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow on, this could be a slideup from v8 instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidate PR: #154450

@mshockwave mshockwave merged commit f82054e into llvm:main Aug 19, 2025
7 of 9 checks passed
@mshockwave mshockwave deleted the patch/rvv/redundant-vmv_s_f-elimination branch August 19, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants