-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RISCV] Handle more cases when combining (vfmv.s.f (extract_subvector X, 0)) #154175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Handle more cases when combining (vfmv.s.f (extract_subvector X, 0)) #154175
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesPreviously, we fold I haven't seen a similar pattern for Full diff: https://github.com/llvm/llvm-project/pull/154175.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index ce03818b49502..72069c547c50e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20738,12 +20738,22 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
isNullConstant(Src.getOperand(1)) &&
Src.getOperand(0).getValueType().isScalableVector()) {
EVT VT = N->getValueType(0);
- EVT SrcVT = Src.getOperand(0).getValueType();
+ SDValue EVSrc = Src.getOperand(0);
+ EVT SrcVT = EVSrc.getValueType();
assert(SrcVT.getVectorElementType() == VT.getVectorElementType());
// Widths match, just return the original vector.
if (SrcVT == VT)
- return Src.getOperand(0);
- // TODO: Use insert_subvector/extract_subvector to change widen/narrow?
+ return EVSrc;
+ SDLoc DL(N);
+ // Width is narrower, using insert_subvector.
+ if (SrcVT.getVectorMinNumElements() < VT.getVectorMinNumElements()) {
+ return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, DAG.getUNDEF(VT),
+ EVSrc,
+ DAG.getConstant(0, DL, Subtarget.getXLenVT()));
+ }
+ // Width is wider, using extract_subvector.
+ return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, EVSrc,
+ DAG.getConstant(0, DL, Subtarget.getXLenVT()));
}
[[fallthrough]];
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
new file mode 100644
index 0000000000000..34bce99a101bb
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
@@ -0,0 +1,56 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr='+v,+zvl512b' < %s | FileCheck %s
+
+define <2 x float> @redundant_vfmv(ptr %p0, ptr %p1, i64 %N) {
+; CHECK-LABEL: redundant_vfmv:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a3, 0
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT: vmv.v.i v8, 0
+; CHECK-NEXT: li a4, 64
+; CHECK-NEXT: .LBB0_1: # %body
+; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: vsetvli zero, a4, e32, m4, ta, ma
+; CHECK-NEXT: vle32.v v12, (a0)
+; CHECK-NEXT: vle32.v v16, (a1)
+; CHECK-NEXT: vfredusum.vs v9, v12, v8
+; CHECK-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT: vslidedown.vi v8, v8, 1
+; CHECK-NEXT: addi a3, a3, 64
+; CHECK-NEXT: addi a1, a1, 256
+; CHECK-NEXT: vsetvli zero, a4, e32, m4, ta, ma
+; CHECK-NEXT: vfredusum.vs v8, v16, v8
+; CHECK-NEXT: vfmv.f.s fa5, v8
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT: vrgather.vi v8, v9, 0
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT: addi a0, a0, 256
+; CHECK-NEXT: bltu a3, a2, .LBB0_1
+; CHECK-NEXT: # %bb.2: # %exit
+; CHECK-NEXT: ret
+entry:
+ br label %body
+
+body:
+ %52 = phi <2 x float> [ zeroinitializer, %entry ], [ %251, %body ]
+ %indvar = phi i64 [0, %entry], [%indvar.next, %body]
+ %ptr0 = getelementptr float, ptr %p0, i64 %indvar
+ %ptr1 = getelementptr float, ptr %p1, i64 %indvar
+
+ %238 = load <64 x float>, ptr %ptr0
+ %239 = load <64 x float>, ptr %ptr1
+
+ %246 = extractelement <2 x float> %52, i64 0
+ %247 = tail call reassoc float @llvm.vector.reduce.fadd.v64f32(float %246, <64 x float> %238)
+ %248 = insertelement <2 x float> poison, float %247, i64 0
+ %249 = extractelement <2 x float> %52, i64 1
+ %250 = tail call reassoc float @llvm.vector.reduce.fadd.v64f32(float %249, <64 x float> %239)
+ %251 = insertelement <2 x float> %248, float %250, i64 1
+
+ %indvar.next = add nuw i64 %indvar, 64
+ %c = icmp uge i64 %indvar.next, %N
+ br i1 %c, label %exit, label %body
+
+exit:
+ ret <2 x float> %251
+}
|
br label %body | ||
|
||
body: | ||
%52 = phi <2 x float> [ zeroinitializer, %entry ], [ %251, %body ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a loop to exercise this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we don't need it. The test is now updated.
… X, 0)) Co-Authored-By: Craig Topper <[email protected]>
a3f1966
to
3373776
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but since I wrote the code in our downstream we should probably wait for another reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -20738,12 +20738,22 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N, | |||
isNullConstant(Src.getOperand(1)) && | |||
Src.getOperand(0).getValueType().isScalableVector()) { | |||
EVT VT = N->getValueType(0); | |||
EVT SrcVT = Src.getOperand(0).getValueType(); | |||
SDValue EVSrc = Src.getOperand(0); | |||
EVT SrcVT = EVSrc.getValueType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to EVSrcVT to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
; CHECK-NEXT: vfmv.f.s fa5, v8 | ||
; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma | ||
; CHECK-NEXT: vrgather.vi v8, v9, 0 | ||
; CHECK-NEXT: vfslide1down.vf v8, v8, fa5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow on, this could be a slideup from v8 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
candidate PR: #154450
Previously, we fold
(vfmv.s.f (extract_subvector X, 0))
into X when X's type is the same asvfmv.s.f
's result type. This patch generalizes it by folding it into insert_subvector when X is narrower and extract_subvector when X is wider.I haven't seen a similar pattern for
vmv.s.x
. Probably becausevfmv.s.f
in this case is usually part of the materialization of floating pointllvm.vector.reduce.*
's start value, which is absent from their integer counterparts.