-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Flang][mlir] - Translation of delayed privatization for deferred target-tasks #155348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Flang][mlir] - Translation of delayed privatization for deferred target-tasks #155348
Conversation
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-flang-driver Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis PR adds support for translation of the private clause on deferred target tasks - that is An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too. Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff 30 Files Affected:
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
#include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
// Add codegen pass pipeline.
fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+ // Run a pass to prepare for translation of delayed privatization in the
+ // context of deferred target tasks.
+ addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+ return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+ });
}
} // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
// CHECK: llvm.return
// CHECK-NOT: func.func
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
%1 = fir.alloca i32
- return
+ return %1 : !fir.ref<i32>
}
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
// CHECK-SAME: %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
// CHECK-SAME: %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
+// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
// CMPLX: %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
// CMPLX: %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX: %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
// SIMPLE: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
// SIMPLE: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
// CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
// CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
// CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
// CHECK: %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
func.func @alloca_unlimited_polymorphic_box() {
%0 = fir.alloca !fir.class<none>
%1 = fir.alloca !fir.class<!fir.array<?xnone>>
%2 = fir.alloca !fir.box<none>
%3 = fir.alloca !fir.box<!fir.array<?xnone>>
+ fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
return
}
// Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
%c9 = arith.constant 9 : index
%c10 = arith.constant 10 : index
- // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+ // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
// CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
- // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+ // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
// CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
// CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
// CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
// PASSES-NEXT: LowerNontemporalPass
// PASSES-NEXT: FIRToLLVMLowering
// PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
// PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
%1 = fir.shape %arg2 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
%c_7 = arith.constant 7 : index
%1 = fir.shape %c_7 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK: store [1 x i8] c" ", ptr %[[VAL_18]], align 1
// CHECK: call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
// CHECK: %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK: %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK: %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
// CHECK: %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK: %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
// CHECK: %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
// CHECK: call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
// CHECK-SAME: %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
// CHECK-SAME: %[[VAL_3:.*]])
-// CHECK: %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK: %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK: %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK: %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
// CHECK: %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
// CHECK: %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
// CHECK: call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
// CHECK: %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
// CHECK: %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_dt_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
// CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
%0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
%1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
%2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
- // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+ // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
// CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
// CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
// CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
omp.yield(%0 : !fir.ref<!fir.box<i32>>)
}
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain() -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
%4 = fir.alloca !fir.box<i32>
omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
omp.terminator
}
- return
+ return %4: !fir.ref<!fir.box<i32>>
}
// basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
// CHECK-NEXT: alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
// CHECK-LABEL: @foo3
func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
- // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
- // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+ // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
// CHECK: icmp ne i64 %[[ptr]], 0
%0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
func.func private @bar(!fir.ref<!fir.char<1,?>>)
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
// CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
// CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
%0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
//%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
%2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
- return
+ return %0 : !fir.ref<!fir.type<_QTt1>>
}
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
// CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
// CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
// CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
- // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+ // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
// CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
// CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
// CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
// CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
// CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
// CHECK: %[[LEN:...
[truncated]
|
@llvm/pr-subscribers-mlir-openmp Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis PR adds support for translation of the private clause on deferred target tasks - that is An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too. Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff 30 Files Affected:
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
#include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
// Add codegen pass pipeline.
fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+ // Run a pass to prepare for translation of delayed privatization in the
+ // context of deferred target tasks.
+ addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+ return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+ });
}
} // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
// CHECK: llvm.return
// CHECK-NOT: func.func
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
%1 = fir.alloca i32
- return
+ return %1 : !fir.ref<i32>
}
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
// CHECK-SAME: %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
// CHECK-SAME: %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
+// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
// CMPLX: %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
// CMPLX: %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX: %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
// SIMPLE: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
// SIMPLE: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
// CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
// CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
// CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
// CHECK: %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
func.func @alloca_unlimited_polymorphic_box() {
%0 = fir.alloca !fir.class<none>
%1 = fir.alloca !fir.class<!fir.array<?xnone>>
%2 = fir.alloca !fir.box<none>
%3 = fir.alloca !fir.box<!fir.array<?xnone>>
+ fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
return
}
// Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
%c9 = arith.constant 9 : index
%c10 = arith.constant 10 : index
- // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+ // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
// CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
- // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+ // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
// CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
// CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
// CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
// PASSES-NEXT: LowerNontemporalPass
// PASSES-NEXT: FIRToLLVMLowering
// PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
// PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
%1 = fir.shape %arg2 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
%c_7 = arith.constant 7 : index
%1 = fir.shape %c_7 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK: store [1 x i8] c" ", ptr %[[VAL_18]], align 1
// CHECK: call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
// CHECK: %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK: %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK: %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
// CHECK: %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK: %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
// CHECK: %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
// CHECK: call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
// CHECK-SAME: %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
// CHECK-SAME: %[[VAL_3:.*]])
-// CHECK: %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK: %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK: %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK: %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
// CHECK: %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
// CHECK: %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
// CHECK: call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
// CHECK: %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
// CHECK: %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_dt_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
// CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
%0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
%1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
%2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
- // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+ // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
// CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
// CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
// CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
omp.yield(%0 : !fir.ref<!fir.box<i32>>)
}
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain() -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
%4 = fir.alloca !fir.box<i32>
omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
omp.terminator
}
- return
+ return %4: !fir.ref<!fir.box<i32>>
}
// basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
// CHECK-NEXT: alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
// CHECK-LABEL: @foo3
func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
- // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
- // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+ // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
// CHECK: icmp ne i64 %[[ptr]], 0
%0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
func.func private @bar(!fir.ref<!fir.char<1,?>>)
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
// CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
// CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
%0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
//%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
%2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
- return
+ return %0 : !fir.ref<!fir.type<_QTt1>>
}
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
// CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
// CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
// CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
- // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+ // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
// CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
// CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
// CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
// CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
// CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
// CHECK: %[[LEN:...
[truncated]
|
@llvm/pr-subscribers-flang-openmp Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis PR adds support for translation of the private clause on deferred target tasks - that is An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too. Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff 30 Files Affected:
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
#include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
// Add codegen pass pipeline.
fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+ // Run a pass to prepare for translation of delayed privatization in the
+ // context of deferred target tasks.
+ addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+ return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+ });
}
} // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
// CHECK: llvm.return
// CHECK-NOT: func.func
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
%1 = fir.alloca i32
- return
+ return %1 : !fir.ref<i32>
}
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
// CHECK-SAME: %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
// CHECK-SAME: %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
+// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
// CMPLX: %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
// CMPLX: %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX: %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
// SIMPLE: %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
// SIMPLE: %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
// CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
// CHECK-SAME: i32 %[[len:.*]])
// CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
// CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
// CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
// CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
// CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
// CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
// CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
// CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
// CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
// CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
// CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
// CHECK: %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
// CHECK: %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
func.func @alloca_unlimited_polymorphic_box() {
%0 = fir.alloca !fir.class<none>
%1 = fir.alloca !fir.class<!fir.array<?xnone>>
%2 = fir.alloca !fir.box<none>
%3 = fir.alloca !fir.box<!fir.array<?xnone>>
+ fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
return
}
// Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
%c9 = arith.constant 9 : index
%c10 = arith.constant 10 : index
- // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+ // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
// CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
- // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+ // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
// CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
// CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
// CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
// PASSES-NEXT: LowerNontemporalPass
// PASSES-NEXT: FIRToLLVMLowering
// PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
// PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
%1 = fir.shape %arg2 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
%c_7 = arith.constant 7 : index
%1 = fir.shape %c_7 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
- // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+ // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
// CHECK: insertvalue {{.*}} i64 %[[size]], 1
// CHECK: insertvalue {{.*}} i32 20240719, 2
// CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK: store [1 x i8] c" ", ptr %[[VAL_18]], align 1
// CHECK: call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
// CHECK: %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK: %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK: %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
// CHECK: %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK: %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK: %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
// CHECK: %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
// CHECK: call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
// CHECK-SAME: %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
// CHECK-SAME: %[[VAL_3:.*]])
-// CHECK: %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK: %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK: %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK: %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
// CHECK: %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK: %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK: %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK: %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
// CHECK: %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
// CHECK: call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
// CHECK: %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK: %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
// CHECK: %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
func.func @_QPtest_dt_slice() {
// CHECK: %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
// CHECK: %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK: %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
// CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
// CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
// CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
%0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
%1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
%2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
- // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+ // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
// CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
// CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
// CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
omp.yield(%0 : !fir.ref<!fir.box<i32>>)
}
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain() -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
%4 = fir.alloca !fir.box<i32>
omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
omp.terminator
}
- return
+ return %4: !fir.ref<!fir.box<i32>>
}
// basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
// CHECK-NEXT: alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
// CHECK-LABEL: @foo3
func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
- // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
- // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+ // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
// CHECK: icmp ne i64 %[[ptr]], 0
%0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
func.func private @bar(!fir.ref<!fir.char<1,?>>)
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
// CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
// CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
%0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
//%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
%2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
- return
+ return %0 : !fir.ref<!fir.type<_QTt1>>
}
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
// CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
// CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
// CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
- // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+ // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
// CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
// CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
// CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
// CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
// CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
// CHECK: %[[LEN:...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide to do this in a pass instead of handling it in MLIR -> LLVM conversion as is done for omp task
?
I did anticipate this question especially because MLIR -> LLVM translation is where I had first started out with the intention of extending your work on A couple of reasons make it too late to do this during MLIR - LLVMIR translation. Too late as in not impossible, but arguably harder to get correct and maintain thereafter. Essentially, what we need to do is
Now, to allocate heap memory for the private variable, we'd have two options
This requires addl bookkeeping and coordination between |
…get-tasks This patch adds support for translation of the private clause on deferred target tasks - that is `omp.target` operations with the `nowait` clause. An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed. We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized), - the pass allocates memory on the heap. - it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap. - Then, the pass updates all the `omp.map.info` operations that pointed to the host variable to now point to the one located in the heap. The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.
…reedy pattern matcher
ff8afbd
to
c859bbc
Compare
Ahh I see what you mean. This is different because as well as being (first)private, these variables may also be mapped, which adds another layer of complexity. I haven't followed much about mapping so I will take your word for it and leave it for experts in offloading to give their opinions. |
auto privVar = std::get<0>(privVarSymPair); | ||
auto privSym = std::get<1>(privVarSymPair); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultra-nit: it would be clearer if you spelt out the types here
auto privSym = std::get<1>(privVarSymPair); | ||
|
||
omp::PrivateClauseOp privatizer = findPrivatizer(targetOp, privSym); | ||
if (!privatizer.needsMap()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this would be easier to understand using privatizer.readsFromMold(), but this might be obvious to someone who works on target offloading regularly so feel free to leave it as it is if you prefer.
|
||
// Allocate heap memory that corresponds to the type of memory | ||
// pointed to by varPtr | ||
// TODO: For boxchars this likely wont be a pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you can see in the code for tasks that boxchars are a hack because you can't really have a !fir.ref<!fir.boxchar<>> in the FIR type system. This is handled for tasks so you can see what I did there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I'll take a look.
// Copy the value of the local variable into the heap-allocated location. | ||
mlir::Location loc = chainOfOps.front()->getLoc(); | ||
mlir::Type varType = getElemType(varPtr); | ||
auto loadVal = rewriter.create<LLVM::LoadOp>(loc, varType, varPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about more complex types e.g. arrays, derived types?
For firstprivate you can use the copy region in the privatizer. For plain private you just need to use an init region to initialise non-trivial types but don't need to copy. This initialisation and copying must happen synchronously.
patterns.add<OMPTargetPrepareDelayedPrivatizationPattern>(&context); | ||
|
||
if (mlir::failed( | ||
applyPatternsGreedily(func, std::move(patterns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the inconvenient fallout from the canonicalisation be avoided by manually walking through func to find target ops? I don't think we need fancy pattern rewriters here because the pattern doesn't need to be applied recursively. e.g.
func.walk([&](omp::TargetOp targetOp) { handleTargetOp(targetOp); });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion. TBH, i had started off with this initially, but then I second guessed myself owing to my belief that a formal/fancy pattern rewriter is what reviewers would prefer. Of course, that was before I even realized i'd have to deal with lit test related annoyances due to canonicalization.
// CHECK-SAME: var_ptr_ptr(%[[VAL_26]] : !llvm.ptr) bounds(%[[VAL_17]]) -> !llvm.ptr {name = ""} | ||
// CHECK: %[[VAL_28:.*]] = omp.map.info var_ptr(%[[HEAP]] : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>) | ||
// CHECK-SAME: map_clauses(always, to) capture(ByRef) members(%[[VAL_27]] : [0] : !llvm.ptr) -> !llvm.ptr | ||
// CHECK: omp.target nowait map_entries(%[[VAL_9]] -> %[[VAL_29:.*]], %[[VAL_28]] -> %[[VAL_30:.*]], %[[VAL_27]] -> %[[VAL_31:.*]] : !llvm.ptr, !llvm.ptr, !llvm.ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this gets outlined, is the structure holding the multiple arguments created on the stack?
`private` opernads that do not require a map, this value is -1 (which is omitted | ||
from the assembly foramt printing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid unrelated changes.
Typo fixes can be committed without review
//===- OpenMPOffloadPrivatizationPrepare.cpp - Prepare for OpenMP Offload | ||
// Privatization ---------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] line-wrap
//===----------------------------------------------------------------------===// | ||
|
||
#define DEBUG_TYPE "omp-prepare-for-offload-privatization" | ||
#define PDBGS() (llvm::dbgs() << "[" << DEBUG_TYPE << "]: ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider LDBG()
(DebugLog.h)
// In this case, we allocate memory for the privatized variable on the heap | ||
// and copy the original variable into this new heap allocation. We fix up | ||
// any omp::MapInfoOp instances that may be mapping the private variable. | ||
mlir::LogicalResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlir::LogicalResult | |
LogicalResult |
All the mlir::
prefixes are unnecessary because of using namespace mlir;
@@ -624,6 +624,7 @@ LogicalResult mlir::MlirOptMain(llvm::raw_ostream &outputStream, | |||
// We use the thread-pool this context is creating, and avoid | |||
// creating any thread when disabled. | |||
MLIRContext threadPoolCtx; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Avoid unrelated changes
//===- OpenMPOffloadPrivatizationPrepare.h - Prepare for OpenMP Offload | ||
// Privatization -*- C++ -*-===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] line-wrap
I think there is no need for the file description
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef MLIR_DIALECT_LLVMIR_TRANSFORMS_PREPAREFOROMPOFFLOADPRIVATIZATIONPASS_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.h already exists. Isn't it sufficient to include that header? It should already include all pass declarations
// After the copy these omp::MapInfoOp instances will refer to heapMem | ||
// instead. | ||
Operation *varPtrDefiningOp = varPtr.getDefiningOp(); | ||
std::set<Operation *> users; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer DenseSet<Operation*>
} else | ||
rewriter.setInsertionPoint( | ||
cloneAndMarkForDeletion(varPtrPtrdefOp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else | |
rewriter.setInsertionPoint( | |
cloneAndMarkForDeletion(varPtrPtrdefOp)); | |
} else { | |
rewriter.setInsertionPoint( | |
cloneAndMarkForDeletion(varPtrPtrdefOp)); | |
} |
If the if
part has braces, the else
part should too
LLVM_ATTRIBUTE_UNUSED auto storePtr = | ||
rewriter.create<LLVM::StoreOp>(loc, dataMalloc.getResult(), | ||
newVarPtrPtrOp->getResult(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM_ATTRIBUTE_UNUSED auto storePtr = | |
rewriter.create<LLVM::StoreOp>(loc, dataMalloc.getResult(), | |
newVarPtrPtrOp->getResult(0)); | |
(void)rewriter.create<LLVM::StoreOp>(loc, dataMalloc.getResult(), | |
newVarPtrPtrOp->getResult(0)); |
to discard a [[nodiscard]]
warning
This PR adds support for translation of the private clause on deferred target tasks - that is
omp.target
operations with thenowait
clause.An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),
omp.map.info
operations that pointed to the host variable to now point to the one located in the heap.The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.