-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[DAG] Add generic expansion for ISD::FCANONICALIZE nodes #142105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@uweigand FYI |
@llvm/pr-subscribers-backend-mips @llvm/pr-subscribers-llvm-selectiondag Author: Dominik Steenken (dominik-steenken) ChangesThis PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use Canonicalization is implemented here as multiplication by Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm . @pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ. I don't have a lot of experience with this part of the code base yet, so reviews are very welcome. Full diff: https://github.com/llvm/llvm-project/pull/142105.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Op);
break;
}
+ case ISD::FCANONICALIZE: {
+ // This implements llvm.canonicalize.f* by multiplication with 1.0,
+ // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+ // Get operand x.
+ SDValue Operand = Node->getOperand(0);
+ // Get fp value type used.
+ EVT VT = Operand.getValueType();
+ // Produce appropriately-typed 1.0 constant.
+ SDValue One = DAG.getConstantFP(1.0, dl, VT);
+ // Produce multiplication node x * 1.0.
+ SDValue Chain = DAG.getEntryNode();
+ SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+ {Chain, Operand, One});
+
+ Results.push_back(Mul);
+ break;
+ }
case ISD::SIGN_EXTEND_INREG: {
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
ISD::FATAN2},
{MVT::f32, MVT::f64, MVT::f128}, Expand);
+ // Insert custom handling default for llvm.canonicalize.*.
+ setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
// FIXME: Query RuntimeLibCalls to make the decision.
setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
{MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call float @llvm.canonicalize.f32(float %a)
+ ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v1, 2, 11
+; Z16-NEXT: mdbr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double %a)
+ ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v0, 2, 8
+; Z16-NEXT: meeb %f0, 0(%r2)
+; Z16-NEXT: ste %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load float, float * %out
+ %canonicalized = call float @llvm.canonicalize.f32(float %val)
+ store float %canonicalized, float * %out
+ ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: mdb %f0, 0(%r2)
+; Z16-NEXT: std %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load double, double * %out
+ %canonicalized = call double @llvm.canonicalize.f64(double %val)
+ store double %canonicalized, double * %out
+ ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vrepf %v0, %v24, 3
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: vrepf %v2, %v24, 2
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: meebr %f2, %f1
+; Z16-NEXT: vrepf %v3, %v24, 1
+; Z16-NEXT: vmrhf %v0, %v2, %v0
+; Z16-NEXT: wfmsb %f2, %v24, %f1
+; Z16-NEXT: wfmsb %f1, %f3, %f1
+; Z16-NEXT: vmrhf %v1, %v2, %v1
+; Z16-NEXT: vmrhg %v24, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+ ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: vrepg %v2, %v24, 1
+; Z16-NEXT: wfmdb %f1, %v24, %f0
+; Z16-NEXT: mdbr %f2, %f0
+; Z16-NEXT: vmrhg %v24, %v1, %v2
+; Z16-NEXT: vrepg %v2, %v26, 1
+; Z16-NEXT: wfmdb %f1, %v26, %f0
+; Z16-NEXT: wfmdb %f0, %f2, %f0
+; Z16-NEXT: vmrhg %v26, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+ ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v0, 0(%r2), 3
+; Z16-NEXT: vrepf %v1, %v0, 3
+; Z16-NEXT: vgmf %v2, 2, 8
+; Z16-NEXT: vrepf %v3, %v0, 2
+; Z16-NEXT: meebr %f1, %f2
+; Z16-NEXT: meebr %f3, %f2
+; Z16-NEXT: vmrhf %v1, %v3, %v1
+; Z16-NEXT: wfmsb %f3, %f0, %f2
+; Z16-NEXT: vrepf %v0, %v0, 1
+; Z16-NEXT: meebr %f0, %f2
+; Z16-NEXT: vmrhf %v0, %v3, %v0
+; Z16-NEXT: vmrhg %v0, %v0, %v1
+; Z16-NEXT: vst %v0, 0(%r2), 3
+; Z16-NEXT: br %r14
+ %val = load <4 x float>, <4 x float> * %out
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+ store <4 x float> %canonicalized, <4 x float> * %out
+ ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v1, 16(%r2), 4
+; Z16-NEXT: vgmg %v2, 2, 11
+; Z16-NEXT: wfmdb %f3, %f1, %f2
+; Z16-NEXT: vrepg %v1, %v1, 1
+; Z16-NEXT: mdbr %f1, %f2
+; Z16-NEXT: vl %v0, 0(%r2), 4
+; Z16-NEXT: vmrhg %v1, %v3, %v1
+; Z16-NEXT: wfmdb %f3, %f0, %f2
+; Z16-NEXT: vrepg %v0, %v0, 1
+; Z16-NEXT: mdbr %f0, %f2
+; Z16-NEXT: vmrhg %v0, %v3, %v0
+; Z16-NEXT: vst %v0, 0(%r2), 4
+; Z16-NEXT: vst %v1, 16(%r2), 4
+; Z16-NEXT: br %r14
+ %val = load <4 x double>, <4 x double> * %out
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+ store <4 x double> %canonicalized, <4 x double> * %out
+ ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16: # %bb.0:
+; Z16-NEXT: llihh %r0, 32760
+; Z16-NEXT: stg %r0, 0(%r2)
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double undef)
+ store double %canonicalized, double * %out
+ ret void
+}
|
@llvm/pr-subscribers-backend-systemz Author: Dominik Steenken (dominik-steenken) ChangesThis PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use Canonicalization is implemented here as multiplication by Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm . @pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ. I don't have a lot of experience with this part of the code base yet, so reviews are very welcome. Full diff: https://github.com/llvm/llvm-project/pull/142105.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Op);
break;
}
+ case ISD::FCANONICALIZE: {
+ // This implements llvm.canonicalize.f* by multiplication with 1.0,
+ // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+ // Get operand x.
+ SDValue Operand = Node->getOperand(0);
+ // Get fp value type used.
+ EVT VT = Operand.getValueType();
+ // Produce appropriately-typed 1.0 constant.
+ SDValue One = DAG.getConstantFP(1.0, dl, VT);
+ // Produce multiplication node x * 1.0.
+ SDValue Chain = DAG.getEntryNode();
+ SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+ {Chain, Operand, One});
+
+ Results.push_back(Mul);
+ break;
+ }
case ISD::SIGN_EXTEND_INREG: {
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
ISD::FATAN2},
{MVT::f32, MVT::f64, MVT::f128}, Expand);
+ // Insert custom handling default for llvm.canonicalize.*.
+ setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
// FIXME: Query RuntimeLibCalls to make the decision.
setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
{MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call float @llvm.canonicalize.f32(float %a)
+ ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v1, 2, 11
+; Z16-NEXT: mdbr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double %a)
+ ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v0, 2, 8
+; Z16-NEXT: meeb %f0, 0(%r2)
+; Z16-NEXT: ste %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load float, float * %out
+ %canonicalized = call float @llvm.canonicalize.f32(float %val)
+ store float %canonicalized, float * %out
+ ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: mdb %f0, 0(%r2)
+; Z16-NEXT: std %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load double, double * %out
+ %canonicalized = call double @llvm.canonicalize.f64(double %val)
+ store double %canonicalized, double * %out
+ ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vrepf %v0, %v24, 3
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: vrepf %v2, %v24, 2
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: meebr %f2, %f1
+; Z16-NEXT: vrepf %v3, %v24, 1
+; Z16-NEXT: vmrhf %v0, %v2, %v0
+; Z16-NEXT: wfmsb %f2, %v24, %f1
+; Z16-NEXT: wfmsb %f1, %f3, %f1
+; Z16-NEXT: vmrhf %v1, %v2, %v1
+; Z16-NEXT: vmrhg %v24, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+ ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: vrepg %v2, %v24, 1
+; Z16-NEXT: wfmdb %f1, %v24, %f0
+; Z16-NEXT: mdbr %f2, %f0
+; Z16-NEXT: vmrhg %v24, %v1, %v2
+; Z16-NEXT: vrepg %v2, %v26, 1
+; Z16-NEXT: wfmdb %f1, %v26, %f0
+; Z16-NEXT: wfmdb %f0, %f2, %f0
+; Z16-NEXT: vmrhg %v26, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+ ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v0, 0(%r2), 3
+; Z16-NEXT: vrepf %v1, %v0, 3
+; Z16-NEXT: vgmf %v2, 2, 8
+; Z16-NEXT: vrepf %v3, %v0, 2
+; Z16-NEXT: meebr %f1, %f2
+; Z16-NEXT: meebr %f3, %f2
+; Z16-NEXT: vmrhf %v1, %v3, %v1
+; Z16-NEXT: wfmsb %f3, %f0, %f2
+; Z16-NEXT: vrepf %v0, %v0, 1
+; Z16-NEXT: meebr %f0, %f2
+; Z16-NEXT: vmrhf %v0, %v3, %v0
+; Z16-NEXT: vmrhg %v0, %v0, %v1
+; Z16-NEXT: vst %v0, 0(%r2), 3
+; Z16-NEXT: br %r14
+ %val = load <4 x float>, <4 x float> * %out
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+ store <4 x float> %canonicalized, <4 x float> * %out
+ ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v1, 16(%r2), 4
+; Z16-NEXT: vgmg %v2, 2, 11
+; Z16-NEXT: wfmdb %f3, %f1, %f2
+; Z16-NEXT: vrepg %v1, %v1, 1
+; Z16-NEXT: mdbr %f1, %f2
+; Z16-NEXT: vl %v0, 0(%r2), 4
+; Z16-NEXT: vmrhg %v1, %v3, %v1
+; Z16-NEXT: wfmdb %f3, %f0, %f2
+; Z16-NEXT: vrepg %v0, %v0, 1
+; Z16-NEXT: mdbr %f0, %f2
+; Z16-NEXT: vmrhg %v0, %v3, %v0
+; Z16-NEXT: vst %v0, 0(%r2), 4
+; Z16-NEXT: vst %v1, 16(%r2), 4
+; Z16-NEXT: br %r14
+ %val = load <4 x double>, <4 x double> * %out
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+ store <4 x double> %canonicalized, <4 x double> * %out
+ ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16: # %bb.0:
+; Z16-NEXT: llihh %r0, 32760
+; Z16-NEXT: stg %r0, 0(%r2)
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double undef)
+ store double %canonicalized, double * %out
+ ret void
+}
|
✅ With the latest revision this PR passed the undef deprecator. |
// This implements llvm.canonicalize.f* by multiplication with 1.0, | ||
// as suggested in https://llvm.org/docs/LangRef.html#id2335. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably elaborate on how this is hacking in strictfp operations in non-strict functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what i understand, the point of the strictfp operation here is to prevent any optimization that would, for instance, recognize that 1.0 * x
can be converted to just x
. So would this suffice as an addition to the comment?
// We use `STRICT_FMUL` here to prevent this operation from being optimized out.
Hi @dominik-steenken absolutely, Please go ahead with this, I have been wanting to get back and finish on the commitments as following PRs, this is much welcome. I will make some time to support canonicalization of denormals ( I have a PR just haven't been able to attend to it, if you need it sooner please let me know, I'll give you the patch and you may go ahead with it ) Thanks for your PR. |
f08a196
to
ce1fea8
Compare
ce1fea8
to
524580c
Compare
524580c
to
07f5741
Compare
Ok, so i updated the comment as requested, got rid of the |
This patch is about the LLVM intrinsic, not the Clang builtin, so the title should say |
__builtin.canonicalize
in common codellvm.canonicalize
in common code
any further concerns? |
Patch title is now less clear, this is DAG codegen support |
So, something like "DAG codegen support for |
How about "[DAG] Add generic expansion for ISD::FCANONICALIZE nodes"? The summary description still needs work |
llvm.canonicalize
in common codeSDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other}, | ||
{Chain, Operand, One}); | ||
|
||
Results.push_back(Mul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the new result chain feels wrong but it's probably correct in this context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominik-steenken Add a comment mentioning that ignoring the output chain was intended.
ret double %canonicalized | ||
} | ||
|
||
define void @canonicalize_ptr_f32(float * %out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test should be updated to use opaque pointers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i've replaced float *
, double *
, <4 x float> *
and <4 x double> *with
ptr`.
// Insert custom handling default for llvm.canonicalize.*. | ||
setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'r going to expand by default, it should do it for all types, not just f32/f64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i have added expansion for f16
and f128
. Should i also add other types that are specific to certain platforms, such as bfloat
, x86_fp80
, or ppc_fp128
?
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5 | ||
; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16 | ||
|
||
define float @canonicalize_fp32(float %a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No half tests? Also are we missing tests in the other touched targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added half and extended fp tests. From what i understand, the other touched targets had already implemented their own legalization of fcanonicalize
, and would have added tests for all of their supported floating point types then. This PR should not change anything there - the targets only needed to be touched since they assumed that common code would not expand by default.
// Produce multiplication node x * 1.0. | ||
SDValue Chain = DAG.getEntryNode(); | ||
SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other}, | ||
{Chain, Operand, One}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing the fast math flags. You can propagate the existing flags, and additionally add NoFPExcept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now propagating the flags and have added NoFPExcept
.
badd5c1
to
fdb8651
Compare
I have rebased and reworded my commits. Does that suffice? I can also reword/edit the PR description if needed. |
Previously, this expansion for the intrinsic has been implemented in the X86 backend. As suggested in the comments for the PR that merges that implementation (llvm#106370), this commit replicates that implementation (at least for `f(16|32|64|128)`) in common code. Canonicalization is thus implemented here as multiplication with 1.0. This commit also touches the following backends: AArch64, AMDGPU, Mips, PowerPC, and LoongArch. These backends previously added their own legalization for `ISD::FCANONICALIZE`, but did not set the OperationAction accordingly, since there was no default OperationAction in common code. Thus, this commit adds the required `setOperationAction` calls and is otherwise intended as NFC for these backends.
This new test file essentially checks the code generation for the common code expansion of `ISD::FCANONICALIZE`, since SystemZ has no custom handling for it at this time. The tests are heavily based on the equivalent tests for X86.
fdb8651
to
4ea37a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minors but otherwise LGTM @arsenm ?
SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other}, | ||
{Chain, Operand, One}); | ||
|
||
Results.push_back(Mul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominik-steenken Add a comment mentioning that ignoring the output chain was intended.
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5 | ||
; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16 | ||
|
||
define half @canonicalize_fp16(half %a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ) nounwind {
to hide all the cfi noise from all the tests
This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use
__builtin_canonicalize
at least forf32
andf64
data types.Canonicalization is implemented here as multiplication by
1.0
, as suggested in the docs.Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm .
@pawan-nirpal-031 I hope you don't mind that i went ahead with this to get it working on SystemZ.
I don't have a lot of experience with this part of the code base yet, so reviews are very welcome.