-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[Clang] Add elementwise maximumnum/minimumnum builtin functions #149775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Addresses llvm#112164. minimumnum and maximumnum intrinsics were added in 5bf81e5. The new built-ins can be used for implementing OpenCL math function fmax and fmin in llvm#128506.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Wenju He (wenju-he) ChangesAddresses #112164. minimumnum and maximumnum intrinsics were added in 5bf81e5. The new built-ins can be used for implementing OpenCL math function fmax and fmin in #128506. Patch is 20.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149775.diff 7 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index f448a9a8db172..8b83dc448d768 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -848,6 +848,14 @@ of different sizes and signs is forbidden in binary and ternary builtins.
semantics, see `LangRef
<http://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation>`_
for the comparison.
+ T __builtin_elementwise_maximumnum(T x, T y) return x or y, whichever is larger. Follows IEEE 754-2019 floating point types
+ semantics, see `LangRef
+ <http://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation>`_
+ for the comparison.
+ T __builtin_elementwise_minimumnum(T x, T y) return x or y, whichever is smaller. Follows IEEE 754-2019 floating point types
+ semantics, see `LangRef
+ <http://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation>`_
+ for the comparison.
============================================== ====================================================================== =========================================
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 46a77673919d3..6a274899e3c43 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -340,6 +340,7 @@ Non-comprehensive list of changes in this release
- Added `__builtin_elementwise_exp10`.
- For AMDPGU targets, added `__builtin_v_cvt_off_f32_i4` that maps to the `v_cvt_off_f32_i4` instruction.
- Added `__builtin_elementwise_minnum` and `__builtin_elementwise_maxnum`.
+- Added `__builtin_elementwise_minnumnum` and `__builtin_elementwise_maxnumnum`.
- No longer crashing on invalid Objective-C categories and extensions when
dumping the AST as JSON. (#GH137320)
- Clang itself now uses split stacks instead of threads for allocating more
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 5ebb82180521d..c81714e9b009d 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -1334,6 +1334,18 @@ def ElementwiseMinimum : Builtin {
let Prototype = "void(...)";
}
+def ElementwiseMaximumNum : Builtin {
+ let Spellings = ["__builtin_elementwise_maximumnum"];
+ let Attributes = [NoThrow, Const, CustomTypeChecking];
+ let Prototype = "void(...)";
+}
+
+def ElementwiseMinimumNum : Builtin {
+ let Spellings = ["__builtin_elementwise_minimumnum"];
+ let Attributes = [NoThrow, Const, CustomTypeChecking];
+ let Prototype = "void(...)";
+}
+
def ElementwiseCeil : Builtin {
let Spellings = ["__builtin_elementwise_ceil"];
let Attributes = [NoThrow, Const, CustomTypeChecking];
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5f2eb76e7bacb..3f784fc8e798f 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -4108,6 +4108,22 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return RValue::get(Result);
}
+ case Builtin::BI__builtin_elementwise_maximumnum: {
+ Value *Op0 = EmitScalarExpr(E->getArg(0));
+ Value *Op1 = EmitScalarExpr(E->getArg(1));
+ Value *Result = Builder.CreateBinaryIntrinsic(
+ Intrinsic::maximumnum, Op0, Op1, nullptr, "elt.maximumnum");
+ return RValue::get(Result);
+ }
+
+ case Builtin::BI__builtin_elementwise_minimumnum: {
+ Value *Op0 = EmitScalarExpr(E->getArg(0));
+ Value *Op1 = EmitScalarExpr(E->getArg(1));
+ Value *Result = Builder.CreateBinaryIntrinsic(
+ Intrinsic::minimumnum, Op0, Op1, nullptr, "elt.minimumnum");
+ return RValue::get(Result);
+ }
+
case Builtin::BI__builtin_reduce_max: {
auto GetIntrinsicID = [this](QualType QT) {
if (auto *VecTy = QT->getAs<VectorType>())
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 5e523fe887318..c74b67106ad74 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -3013,6 +3013,8 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID,
case Builtin::BI__builtin_elementwise_maxnum:
case Builtin::BI__builtin_elementwise_minimum:
case Builtin::BI__builtin_elementwise_maximum:
+ case Builtin::BI__builtin_elementwise_minimumnum:
+ case Builtin::BI__builtin_elementwise_maximumnum:
case Builtin::BI__builtin_elementwise_atan2:
case Builtin::BI__builtin_elementwise_fmod:
case Builtin::BI__builtin_elementwise_pow:
diff --git a/clang/test/CodeGen/builtin-maximumnum-minimumnum.c b/clang/test/CodeGen/builtin-maximumnum-minimumnum.c
new file mode 100644
index 0000000000000..ea9d2e7a4ed38
--- /dev/null
+++ b/clang/test/CodeGen/builtin-maximumnum-minimumnum.c
@@ -0,0 +1,171 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -x c++ -std=c++20 -disable-llvm-passes -O3 -triple x86_64 %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK
+
+typedef _Float16 half8 __attribute__((ext_vector_type(8)));
+typedef __bf16 bf16x8 __attribute__((ext_vector_type(8)));
+typedef float float4 __attribute__((ext_vector_type(4)));
+typedef double double2 __attribute__((ext_vector_type(2)));
+typedef long double ldouble2 __attribute__((ext_vector_type(2)));
+
+// CHECK-LABEL: define dso_local noundef <8 x half> @_Z7pfmin16Dv8_DF16_S_(
+// CHECK-SAME: <8 x half> noundef [[A:%.*]], <8 x half> noundef [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <8 x half>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <8 x half>, align 16
+// CHECK-NEXT: store <8 x half> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: store <8 x half> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <8 x half>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <8 x half>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <8 x half> @llvm.minimumnum.v8f16(<8 x half> [[TMP0]], <8 x half> [[TMP1]])
+// CHECK-NEXT: ret <8 x half> [[ELT_MINIMUMNUM]]
+//
+half8 pfmin16(half8 a, half8 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <8 x bfloat> @_Z8pfmin16bDv8_DF16bS_(
+// CHECK-SAME: <8 x bfloat> noundef [[A:%.*]], <8 x bfloat> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <8 x bfloat>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <8 x bfloat>, align 16
+// CHECK-NEXT: store <8 x bfloat> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <8 x bfloat> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <8 x bfloat> @llvm.minimumnum.v8bf16(<8 x bfloat> [[TMP0]], <8 x bfloat> [[TMP1]])
+// CHECK-NEXT: ret <8 x bfloat> [[ELT_MINIMUMNUM]]
+//
+bf16x8 pfmin16b(bf16x8 a, bf16x8 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <4 x float> @_Z7pfmin32Dv4_fS_(
+// CHECK-SAME: <4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <4 x float>, align 16
+// CHECK-NEXT: store <4 x float> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <4 x float> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <4 x float> @llvm.minimumnum.v4f32(<4 x float> [[TMP0]], <4 x float> [[TMP1]])
+// CHECK-NEXT: ret <4 x float> [[ELT_MINIMUMNUM]]
+//
+float4 pfmin32(float4 a, float4 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <2 x double> @_Z7pfmin64Dv2_dS_(
+// CHECK-SAME: <2 x double> noundef [[A:%.*]], <2 x double> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <2 x double>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <2 x double>, align 16
+// CHECK-NEXT: store <2 x double> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x double> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <2 x double> @llvm.minimumnum.v2f64(<2 x double> [[TMP0]], <2 x double> [[TMP1]])
+// CHECK-NEXT: ret <2 x double> [[ELT_MINIMUMNUM]]
+//
+double2 pfmin64(double2 a, double2 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <2 x x86_fp80> @_Z7pfmin80Dv2_eS_(
+// CHECK-SAME: ptr noundef byval(<2 x x86_fp80>) align 32 [[TMP0:%.*]], ptr noundef byval(<2 x x86_fp80>) align 32 [[TMP1:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <2 x x86_fp80>, align 32
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <2 x x86_fp80>, align 32
+// CHECK-NEXT: [[A:%.*]] = load <2 x x86_fp80>, ptr [[TMP0]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[B:%.*]] = load <2 x x86_fp80>, ptr [[TMP1]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x x86_fp80> [[A]], ptr [[A_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x x86_fp80> [[B]], ptr [[B_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = load <2 x x86_fp80>, ptr [[A_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP3:%.*]] = load <2 x x86_fp80>, ptr [[B_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <2 x x86_fp80> @llvm.minimumnum.v2f80(<2 x x86_fp80> [[TMP2]], <2 x x86_fp80> [[TMP3]])
+// CHECK-NEXT: ret <2 x x86_fp80> [[ELT_MINIMUMNUM]]
+//
+ldouble2 pfmin80(ldouble2 a, ldouble2 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+
+// CHECK-LABEL: define dso_local noundef <8 x half> @_Z7pfmax16Dv8_DF16_S_(
+// CHECK-SAME: <8 x half> noundef [[A:%.*]], <8 x half> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <8 x half>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <8 x half>, align 16
+// CHECK-NEXT: store <8 x half> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <8 x half> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <8 x half>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <8 x half>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MAXIMUMNUM:%.*]] = call <8 x half> @llvm.maximumnum.v8f16(<8 x half> [[TMP0]], <8 x half> [[TMP1]])
+// CHECK-NEXT: ret <8 x half> [[ELT_MAXIMUMNUM]]
+//
+half8 pfmax16(half8 a, half8 b) {
+ return __builtin_elementwise_maximumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <8 x bfloat> @_Z8pfmax16bDv8_DF16bS_(
+// CHECK-SAME: <8 x bfloat> noundef [[A:%.*]], <8 x bfloat> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <8 x bfloat>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <8 x bfloat>, align 16
+// CHECK-NEXT: store <8 x bfloat> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <8 x bfloat> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MAXIMUMNUM:%.*]] = call <8 x bfloat> @llvm.maximumnum.v8bf16(<8 x bfloat> [[TMP0]], <8 x bfloat> [[TMP1]])
+// CHECK-NEXT: ret <8 x bfloat> [[ELT_MAXIMUMNUM]]
+//
+bf16x8 pfmax16b(bf16x8 a, bf16x8 b) {
+ return __builtin_elementwise_maximumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <4 x float> @_Z7pfmax32Dv4_fS_(
+// CHECK-SAME: <4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <4 x float>, align 16
+// CHECK-NEXT: store <4 x float> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <4 x float> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MAXIMUMNUM:%.*]] = call <4 x float> @llvm.maximumnum.v4f32(<4 x float> [[TMP0]], <4 x float> [[TMP1]])
+// CHECK-NEXT: ret <4 x float> [[ELT_MAXIMUMNUM]]
+//
+float4 pfmax32(float4 a, float4 b) {
+ return __builtin_elementwise_maximumnum(a, b);
+}
+// CHECK-LABEL: define dso_local noundef <2 x double> @_Z7pfmax64Dv2_dS_(
+// CHECK-SAME: <2 x double> noundef [[A:%.*]], <2 x double> noundef [[B:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <2 x double>, align 16
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <2 x double>, align 16
+// CHECK-NEXT: store <2 x double> [[A]], ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x double> [[B]], ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[A_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr [[B_ADDR]], align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MAXIMUMNUM:%.*]] = call <2 x double> @llvm.maximumnum.v2f64(<2 x double> [[TMP0]], <2 x double> [[TMP1]])
+// CHECK-NEXT: ret <2 x double> [[ELT_MAXIMUMNUM]]
+//
+double2 pfmax64(double2 a, double2 b) {
+ return __builtin_elementwise_maximumnum(a, b);
+}
+
+// CHECK-LABEL: define dso_local noundef <2 x x86_fp80> @_Z7pfmax80Dv2_eS_(
+// CHECK-SAME: ptr noundef byval(<2 x x86_fp80>) align 32 [[TMP0:%.*]], ptr noundef byval(<2 x x86_fp80>) align 32 [[TMP1:%.*]]) #[[ATTR2]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <2 x x86_fp80>, align 32
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <2 x x86_fp80>, align 32
+// CHECK-NEXT: [[A:%.*]] = load <2 x x86_fp80>, ptr [[TMP0]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[B:%.*]] = load <2 x x86_fp80>, ptr [[TMP1]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x x86_fp80> [[A]], ptr [[A_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <2 x x86_fp80> [[B]], ptr [[B_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = load <2 x x86_fp80>, ptr [[A_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP3:%.*]] = load <2 x x86_fp80>, ptr [[B_ADDR]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[ELT_MINIMUMNUM:%.*]] = call <2 x x86_fp80> @llvm.minimumnum.v2f80(<2 x x86_fp80> [[TMP2]], <2 x x86_fp80> [[TMP3]])
+// CHECK-NEXT: ret <2 x x86_fp80> [[ELT_MINIMUMNUM]]
+//
+ldouble2 pfmax80(ldouble2 a, ldouble2 b) {
+ return __builtin_elementwise_minimumnum(a, b);
+}
+
+//.
+// CHECK: [[TBAA2]] = !{[[META3:![0-9]+]], [[META3]], i64 0}
+// CHECK: [[META3]] = !{!"omnipotent char", [[META4:![0-9]+]], i64 0}
+// CHECK: [[META4]] = !{!"Simple C++ TBAA"}
+//.
diff --git a/clang/test/Sema/builtins-elementwise-math.c b/clang/test/Sema/builtins-elementwise-math.c
index 01057b3f8d083..8548d3be8c44a 100644
--- a/clang/test/Sema/builtins-elementwise-math.c
+++ b/clang/test/Sema/builtins-elementwise-math.c
@@ -386,6 +386,96 @@ void test_builtin_elementwise_minimum(int i, short s, float f, double d, float4
// expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was '_Complex float')}}
}
+void test_builtin_elementwise_maximumnum(int i, short s, float f, double d, float4 fv, double4 dv, int3 iv, unsigned3 uv, int *p) {
+ i = __builtin_elementwise_maximumnum(p, d);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'int *')}}
+
+ struct Foo foo = __builtin_elementwise_maximumnum(d, d);
+ // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}}
+
+ i = __builtin_elementwise_maximumnum(i);
+ // expected-error@-1 {{too few arguments to function call, expected 2, have 1}}
+
+ i = __builtin_elementwise_maximumnum();
+ // expected-error@-1 {{too few arguments to function call, expected 2, have 0}}
+
+ i = __builtin_elementwise_maximumnum(i, i, i);
+ // expected-error@-1 {{too many arguments to function call, expected 2, have 3}}
+
+ i = __builtin_elementwise_maximumnum(fv, iv);
+ // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+ i = __builtin_elementwise_maximumnum(uv, iv);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'unsigned3' (vector of 3 'unsigned int' values))}}
+
+ dv = __builtin_elementwise_maximumnum(fv, dv);
+ // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'double4' (vector of 4 'double' values))}}
+
+ d = __builtin_elementwise_maximumnum(f, d);
+ // expected-error@-1 {{arguments are of different types ('float' vs 'double')}}
+
+ fv = __builtin_elementwise_maximumnum(fv, fv);
+
+ i = __builtin_elementwise_maximumnum(iv, iv);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'int3' (vector of 3 'int' values))}}
+
+ i = __builtin_elementwise_maximumnum(i, i);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'int')}}
+
+ int A[10];
+ A = __builtin_elementwise_maximumnum(A, A);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'int *')}}
+
+ _Complex float c1, c2;
+ c1 = __builtin_elementwise_maximumnum(c1, c2);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was '_Complex float')}}
+}
+
+void test_builtin_elementwise_minimumnum(int i, short s, float f, double d, float4 fv, double4 dv, int3 iv, unsigned3 uv, int *p) {
+ i = __builtin_elementwise_minimumnum(p, d);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'int *')}}
+
+ struct Foo foo = __builtin_elementwise_minimumnum(d, d);
+ // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}}
+
+ i = __builtin_elementwise_minimumnum(i);
+ // expected-error@-1 {{too few arguments to function call, expected 2, have 1}}
+
+ i = __builtin_elementwise_minimumnum();
+ // expected-error@-1 {{too few arguments to function call, expected 2, have 0}}
+
+ i = __builtin_elementwise_minimumnum(i, i, i);
+ // expected-error@-1 {{too many arguments to function call, expected 2, have 3}}
+
+ i = __builtin_elementwise_minimumnum(fv, iv);
+ // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+ i = __builtin_elementwise_minimumnum(uv, iv);
+ // expected-error@-1 {{1st argument must be a scalar or vector of floating-point types (was 'unsigned3' (vector of 3 'unsigned int' values))}}
+
+ dv = __builtin_elementwise_minimumnum(fv, dv);
+ // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'double4' (vector of 4 'double' values))}}
+
+ d = __builtin_elementwise_minimumnum(f, d);
+ // expected-error@-1 {{arguments are of different types ('float' v...
[truncated]
|
T __builtin_elementwise_maximumnum(T x, T y) return x or y, whichever is larger. Follows IEEE 754-2019 floating point types | ||
semantics, see `LangRef | ||
<http://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation>`_ | ||
for the comparison. | ||
T __builtin_elementwise_minimumnum(T x, T y) return x or y, whichever is smaller. Follows IEEE 754-2019 floating point types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. :-)
We have __builtin_elementwise_min
, __builtin_elementwise_minnum
, __builtin_elementwise_minimum
, and now we're adding __builtin_elementwise_minimumnum
?
I think the docs need to be expanded a bit to help understand what the difference is between the four choices. Same for maximum.
Why do we need four different builtins to select the lesser of two values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, I see now, these are the elementwise variants of the existing C math library functions. Carry on. :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, I see now, these are the elementwise variants of the existing C math library functions. Carry on. :-D
yes, the difference is explained in the link http://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation
@@ -4108,6 +4108,22 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, | |||
return RValue::get(Result); | |||
} | |||
|
|||
case Builtin::BI__builtin_elementwise_maximumnum: { | |||
Value *Op0 = EmitScalarExpr(E->getArg(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever ensure we actually HAVE 2 arguments? Both of these refer to the 1st arg, but the prototype accepts 0 or 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever ensure we actually HAVE 2 arguments?
It is checked at
llvm-project/clang/lib/Sema/SemaChecking.cpp
Line 15756 in 4d48996
if (checkArgCount(TheCall, 2)) |
Both of these refer to the 1st arg, but the prototype accepts 0 or 1.
sorry I don't get it. Op0
is from the 1st arg and Op1
is from the second arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I stopped chasing down the call-stack 1 too few :) I see it now in Sema::BuiltinVectorMath(
.
I was pointing out that we refer to [1]
(the second element), but the Prototype in Builtins.td is a ...
which allows 0 or 1 arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pointing out that we refer to
[1]
(the second element), but the Prototype in Builtins.td is a...
which allows 0 or 1 arguments.
I see, thanks @erichkeane
@@ -4108,6 +4108,22 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, | |||
return RValue::get(Result); | |||
} | |||
|
|||
case Builtin::BI__builtin_elementwise_maximumnum: { | |||
Value *Op0 = EmitScalarExpr(E->getArg(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I stopped chasing down the call-stack 1 too few :) I see it now in Sema::BuiltinVectorMath(
.
I was pointing out that we refer to [1]
(the second element), but the Prototype in Builtins.td is a ...
which allows 0 or 1 arguments.
Addresses #112164. minimumnum and maximumnum intrinsics were added in 5bf81e5.
The new built-ins can be used for implementing OpenCL math function fmax and fmin in #128506.