Use ATTACH maps for array-sections/subscripts on pointers. #1

abhinavgaba · 2025-07-16T14:02:39Z

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036
[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.

With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.

 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }

Attach-style maps for declare mappers. That should be a separate PR.
use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
use_device_ptr handling has some issues too. Need debugging.
Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

abhinavgaba · 2025-07-16T14:03:51Z

offload/libomptarget/interface.cpp

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba · 2025-07-23T13:31:19Z

clang/lib/CodeGen/CGOpenMPRuntime.cpp

@@ -7096,8 +7129,8 @@ class MappableExprsHandler {
      const ValueDecl *Mapper = nullptr, bool ForDeviceAddr = false,
      const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
      ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
-          OverlappedElements = {},
-      bool AreBothBasePtrAndPteeMapped = false) const {


AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

Extend support in LLDB for WebAssembly. This PR adds a new Process plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly targets. It adds support for WebAssembly's memory model with separate address spaces, and the ability to fetch the call stack from the WebAssembly runtime. I have tested this change with the WebAssembly Micro Runtime (WAMR, https://github.com/bytecodealliance/wasm-micro-runtime) which implements a GDB debug stub and supports the qWasmCallStack packet. ``` (lldb) process connect --plugin wasm connect://localhost:4567 Process 1 stopped * thread #1, name = 'nobody', stop reason = trace frame #0: 0x40000000000001ad wasm32_args.wasm`main: -> 0x40000000000001ad <+3>: global.get 0 0x40000000000001b3 <+9>: i32.const 16 0x40000000000001b5 <+11>: i32.sub 0x40000000000001b6 <+12>: local.set 0 (lldb) b add Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c (lldb) c Process 1 resuming Process 1 stopped * thread #1, name = 'nobody', stop reason = breakpoint 1.1 frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 1 int 2 add(int a, int b) 3 { -> 4 return a + b; 5 } 6 7 int (lldb) bt * thread #1, name = 'nobody', stop reason = breakpoint 1.1 * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12 frame #2: 0x40000000000001fe wasm32_args.wasm ``` This PR is based on an unmerged patch from Paolo Severini: https://reviews.llvm.org/D78801. I intentionally stuck to the foundations to keep this PR small. I have more PRs in the pipeline to support the other features/packets. My motivation for supporting Wasm is to support debugging Swift compiled to WebAssembly: https://www.swift.org/documentation/articles/wasm-getting-started.html

…erver (llvm#148774) Summary: There was a deadlock was introduced by [PR llvm#146441](llvm#146441) which changed `CurrentThreadIsPrivateStateThread()` to `CurrentThreadPosesAsPrivateStateThread()`. This change caused the execution path in [`ExecutionContextRef::SetTargetPtr()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L513) to now enter a code block that was previously skipped, triggering [`GetSelectedFrame()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L522) which leads to a deadlock. Thread 1 gets m_modules_mutex in [`ModuleList::AppendImpl`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Core/ModuleList.cpp#L218), Thread 3 gets m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501), but then Thread 1 waits for m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501) while Thread 3 waits for m_modules_mutex in [`ScanForGNUstepObjCLibraryCandidate`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Plugins/LanguageRuntime/ObjC/GNUstepObjCRuntime/GNUstepObjCRuntime.cpp#L57). This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call [`NotifyModuleAdded`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Target.cpp#L1810) should be thread-safe, since the module should be added to the `ModuleList` before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and the `Target` instance. ### Deadlocked Thread backtraces: ``` * thread #3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41 frame llvm#9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8 frame llvm#10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36 frame llvm#11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9 ... frame llvm#21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48 frame llvm#22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32 frame llvm#23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3 frame llvm#24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23 frame llvm#25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41 ... * thread #1, name = 'lldb', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41 frame llvm#9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36 ... frame llvm#13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19 frame llvm#14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3 ... frame llvm#19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9 frame llvm#20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23 ``` ## Test Plan: Built a hello world a.out Run server in one terminal: ``` ~/llvm/build/Debug/bin/lldb-server g :1234 a.out ``` Run client in another terminal ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" ``` Before: Client hangs indefinitely ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main" (lldb) gdb-remote 1234 ^C^C ``` After: ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" (lldb) gdb-remote 1234 Process 837068 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007ffff7fe4a60 ld-linux-x86-64.so.2`_start: -> 0x7ffff7fe4a60 <+0>: movq %rsp, %rdi 0x7ffff7fe4a63 <+3>: callq 0x7ffff7fe5780 ; _dl_start at rtld.c:522:1 ld-linux-x86-64.so.2`_dl_start_user: 0x7ffff7fe4a68 <+0>: movq %rax, %r12 0x7ffff7fe4a6b <+3>: movl 0x18067(%rip), %eax ; _dl_skip_args (lldb) b hello.cc:3 Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf (lldb) c Process 837068 resuming Process 837068 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13 1 #include <iostream> 2 3 int main() { -> 4 std::cout << "Hello World" << std::endl; 5 return 0; 6 } ```

…lvm#152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, llvm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, llvm#32 add x12, x12, llvm#32 stp q0, q1, [x10, #-16] add x10, x10, llvm#32 ... ```

M68k's SETCC instruction (`scc`) distinctly fills the destination byte with all 1s. If boolean contents are set to `ZeroOrOneBooleanContent`, LLVM can mistakenly think the destination holds `0x01` instead of `0xff` and emit broken code as a result. This change corrects the boolean content type to `ZeroOrNegativeOneBooleanContent`. For example, this IR: ```llvm define dso_local signext range(i8 0, 2) i8 @testBool(i32 noundef %a) local_unnamed_addr #0 { entry: %cmp = icmp eq i32 %a, 4660 %. = zext i1 %cmp to i8 ret i8 %. } ``` would previously build as: ```asm testBool: ; @testBool cmpi.l llvm#4660, (4,%sp) seq %d0 and.l llvm#255, %d0 rts ``` Notice the `zext` is erroneously not clearing the low bits, and thus the register returns with 255 instead of 1. This patch fixes the issue: ```asm testBool: ; @testBool cmpi.l llvm#4660, (4,%sp) seq %d0 and.l #1, %d0 rts ``` Most of the tests containing `scc` suffered from the same value error as described above, so those tests have been updated to match the new output (which also logically corrects them).

## Problem When the new setting ``` set target.parallel-module-load true ``` was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices. The top of the stack in the crash look something like this: ``` #0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe) #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99) #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda) #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0 llvm#4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707) llvm#5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41) llvm#6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e) llvm#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff) llvm#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc) ``` Our workaround was to set `set target.parallel-module-load ` to `false` to avoid the crash. ## Background PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection. ## Changes in this diff * This diff refactors the code to (hopefully) have clearer ownership of the connection, clearer separation of AdbClient and SyncService by making a new class for clearer separations of concerns, called AdbSyncService. * New unit tests are added * Additional logs were added (see llvm#145382 (comment) for details)

…namic (llvm#153420) Canonicalizing the following IR: ``` func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> { %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32> %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8> %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32> return %2 : tensor<?x17xf32> } ``` resulted in a crash ``` #0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db) #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1 #3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18 llvm#4 0x0000565131b18c30 SignalHandler(int, siginfo_t*, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3 llvm#5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050) llvm#6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6 llvm#8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7 llvm#9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9 llvm#10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2) llvm#11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3 llvm#12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10 llvm#13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0 ``` from the folder for `tosa::mul` since the zero value was being reshaped to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all dimensions to be static. So in this case, the fix is to not to fold the op.

gfx1250 only supports wave32.

…lvm#153691)

Reported from llvm#153393 (comment) During DAGCombine, an intermediate extract_subvector sequence was generated: ``` t8: v9i16 = extract_subvector t3, Constant:i64<9> t24: v8i16 = extract_subvector t8, Constant:i64<0> ``` And one of the DAGCombine rule which turns `(extract_subvector (extract_subvector X, C), 0)` into `(extract_subvector X, C)` kicked in and turn that into `v8i16 = extract_subvector t3, Constant:i64<9>`. But it forgot to check if the extracted index is a multiple of the minimum vector length of the result type, hence the crash. This patch fixes this by adding an additional check.

…51660)" This reverts commit 76dd742.

…lvm#150192) a961210 reverted a change to use a binary search on the string name table because it was too slow. This replaces it with a static string hash table based on the known set of libcall names. Microbenchmarking shows this is similarly fast to using DenseMap. It's possibly slightly slower than using StringSet, though these aren't an exact comparison. This also saves on the one time use construction of the map, so it could be better in practice. This search isn't simple set check, since it does find the range of possible matches with the same name. There's also an additional check for whether the current target supports the name. The runtime constructed set doesn't require this, since it only adds the symbols live for the target. Followed algorithm from this post http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html I'm also thinking the 2 special case global symbols should just be added to RuntimeLibcalls. There are also other global references emitted in the backend that aren't tracked; we probably should just use this as a centralized database for all compiler selected symbols.

…53632)

llvm#153700) Pointer auth protection of the block descriptor pointer is only supported in some constrained environments so we do actually need it to be configurable. We had made it non configurable in the first PR to protect block metadata because we believed that was an option but subsequently realised it does need to remain configurable. This PR revives the flags that permit this.

The snippet was originally from llvm-reduce but we probably shouldn't use a null pointer in the actual test case. NFC.

Does not yet fully propagate this down into the TargetLowering uses, many of which are relying on null checks on the returned value.

Fix up for llvm#153548, which is from llvm#153150.

…lvm#153687) What most code wants to know is the direction and we have to decode the opcode to figure that out. Instead pass the direction around as a bool and convert to opcode when we create the merge instruction.

…lvm#153051) proof: https://alive2.llvm.org/ce/z/WVt4-F

This commit is a re-do of e4a8969, which got reverted, with the same goal: dramatically speed-up clang-tidy by avoiding doing work in system headers (which is wasteful as warnings are later discarded). This proposal was already discussed here with favorable feedback: llvm#132725 The novelty of this patch is: - It's less aggressive: it does not fiddle with AST traversal. This solves the issue with the previous patch, which impacted the ability to inspect parents of a given node. - Instead, what we optimize for is exitting early in each `Traverse*` function of `MatchASTVisitor` if the node is in a system header, thus avoiding calling the `match()` function with its corresponding callback (when there is a match). - It does not cause any failing tests. - It does not move `MatchFinderOptions` - instead we add a user-defined default constructor which solves the same problem. - It introduces a function `shouldSkipNode` which can be extended for adding more conditions. For example there's a PR open about skipping modules in clang-tidy where this could come handy: llvm#145630 As a benchmark, I ran clang-tidy with all checks activated, on a single .cpp file which #includes all the standard C++ headers, then measure the time as well as found warnings. On trunk: ``` Suppressed 75413 warnings (75413 in non-user code). real 0m12.418s user 0m12.270s sys 0m0.129s ``` With this patch: ``` Suppressed 11448 warnings (11448 in non-user code). Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well. real 0m1.666s user 0m1.538s sys 0m0.129s ``` With the original patch that got reverted: ``` Suppressed 11428 warnings (11428 in non-user code). real 0m1.193s user 0m1.096s sys 0m0.096s ``` We therefore get a dramatic reduction in number of warnings and runtime, with no change in functionality. The remaining warnings are due to `PPCallbacks` - implementing a similar system-header exclusion mechanism there can lead to almost no warnings left in system headers. This does not bring the runtime down as much, though, so it's probably not worth the effort. Fixes llvm#52959 Co-authored-by: Carlos Gálvez <[email protected]>

…positive for inherited members (llvm#153941) ```cpp struct Base { int m; }; template <class T> struct Derived : Base { Derived() { m = 0; } }; ``` would previously generate the following output: ``` <source>:7:15: warning: 'm' should be initialized in a member initializer of the constructor [cppcoreguidelines-prefer-member-initializer] 7 | Derived() { m = 0; } | ^~~~~~ | : m(0) ``` This patch fixes this false positive. Note that before this patch the checker won't give false positive for ```cpp struct Derived : Base { Derived() { m = 0; } }; ``` and the constructor's AST is ``` `-CXXConstructorDecl 0x557df03d1fb0 <line:7:3, col:22> col:3 Derived 'void ()' implicit-inline |-CXXCtorInitializer 'Base' | `-CXXConstructExpr 0x557df03d2748 <col:3> 'Base' 'void () noexcept' `-CompoundStmt 0x557df03d2898 <col:13, col:22> `-BinaryOperator 0x557df03d2878 <col:15, col:19> 'int' lvalue '=' |-MemberExpr 0x557df03d2828 <col:15> 'int' lvalue ->m 0x557df03d1c40 | `-ImplicitCastExpr 0x557df03d2808 <col:15> 'Base *' <UncheckedDerivedToBase (Base)> | `-CXXThisExpr 0x557df03d27f8 <col:15> 'Derived *' implicit this `-IntegerLiteral 0x557df03d2858 <col:19> 'int' 0 ``` so `isAssignmentToMemberOf` would return empty due to https://github.com/llvm/llvm-project/blob/f0967fca04c880e9aabd5be043a85127faabb4c6/clang-tools-extra/clang-tidy/cppcoreguidelines/PreferMemberInitializerCheck.cpp#L118-L119 Fixes llvm#104400

…#154005)

This fixes a small typo in the toy tutorial. A code block was not correctly terminated, causing it to run into the subsequent block.

…m#152944) Simple fix for this particular html tag. A more complete solution should be implemented. 1. Add all html tags to table so they are recognized. Some input on what is desirable/safe would be appreciated 2. Change the lex strategy to deal with this in a different manner Fixes llvm#32680 --------- Co-authored-by: Brock Denson <[email protected]>

This patch adds some hdrgen yaml for ioctl(). Otherwise the function never actually ends up being available in a full build. This is the last thing that is needed to enable turning on LIBCXX_ENABLE_RANDOM_DEVICE.

Fixes llvm#131273 Adds a check to avoid division when max value of denominator is zero.

Proof: https://alive2.llvm.org/ce/z/a5Yjb8

…vm#153839) This patch makes the current behavior explicit to prepare for adding VTs for v[567]f16. Right now these types are EVTs and hence don't fall under getPreferredVectorAction and are simply widened to the next legal power-of-two vector type. For SSE2 this is v8f16. Without the preparatory patch however, the behavior would change after adding these types. getPreferredVectorAction would try to split them because this is the current behavior for any f16 vector type that is not legal. There is a lot more detail at llvm#152150 in particular how splitting these new types leads to an inconsistency between NumRegistersForVT and getTypeAction. The patch ensures that after the new types are added they would continue to be widened rather than split. Once the patch to enable v[567]f16 lands, it will be an NFC for x86.

llvm#153525) Fixes llvm#153443

Fixes llvm#153448

…lvm#153924) Fixes llvm#153891

Also set it to SIEB_Always for WebKit style. Closes llvm#85525. Closes llvm#93635.

…lvm#154028) Most of the time we don't need instruction opcode. There is no need to carry it around all the time, we can easily get it by other means. Rename affected variables accordingly. Part of an effort to simplify DecoderEmitter code.

All relevant places should already explicitly materialize broadcasts. Remove dead code from VPTransformState::get

…m#152690) llvm#146226 with fixing asinpi MPFR number function and make it work when mpfr < `4.2.0`

…cit method call (llvm#153524) Retry landing llvm#153373 ## Major changes from previous attempt - remove the test in CAPI because no existing tests in CAPI deal with sanitizer exemptions - update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load GPU binary in global ctors, instead of loading them at call site. - skip the test on Aarch64 since we have an issue with initialization there --------- Co-authored-by: Mehdi Amini <[email protected]>

…153870) This helps better distinguish warnings that could be disabled via `.clang-tidy` config (like `clang-diagnostic-literal-conversion`) from errors that could not be suppressed at all (like `clang-diagnostic-error`) because it's a hard compiler error.

…lvm#149036) This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, these two possible maps could be emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi! --------- Co-authored-by: Alex Duran <[email protected]>

…penCL spec (llvm#153784)

The name is misleading, as setting Fragment to nullptr does not necessarily make it undefined - common and equated symbols have a nullptr fragment as well.

) This unifies naming scheme of macros to address review comment intel/llvm#19779 (comment) math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`

) Adds `constexpr` support for `pmuludq` and `pmuldq` intrinsics. Closes llvm#153002. Part of llvm#30794.

…on-using-attach-maptype

… findattachptr to a common place to access from SemaOpenmp.

abhinavgaba commented Jul 16, 2025

View reviewed changes

offload/libomptarget/interface.cpp Outdated

Copy link

Owner Author

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba mentioned this pull request Jul 17, 2025

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036

Merged

abhinavgaba changed the title ~~[WIP] Use ATTACH maps for array-sections/subscripts on pointers.~~ Use ATTACH maps for array-sections/subscripts on pointers. Jul 22, 2025

This was referenced Jul 22, 2025

[OpenMP] Mapping of 'middle' structures chained through '->' does not work llvm/llvm-project#141042

Open

[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

Open

abhinavgaba commented Jul 23, 2025

View reviewed changes

rampitec and others added 18 commits August 14, 2025 15:54

[AMDGPU] Remove wave64 functions (llvm#153690)

a629119

gfx1250 only supports wave32.

[flang][cuda] Add interface for __saturatef (llvm#153705)

602f308

[gn build] Port d56fa96

47bc6ac

Update copy-gaps tests.

502dbb4

[flang][cuda] Add interfaces for __float2int_rX and __float2unit_rX (l…

ffe4870

…lvm#153691)

[AMDGPU] Enable kernarg preload on gfx1250 (llvm#153686)

8bce10a

[flang][cuda] Add interfaces for __int2float_rX (llvm#153708)

3bc4d66

Revert "[CGData] Lazy loading support for stable function map (llvm#1…

07d3a73

…51660)" This reverts commit 76dd742.

[Support] Add mapped_file_region::sync(), equivalent to msync (llvm#1…

7e46f5d

…53632)

[AMDGPU] Fix the comment wrt SSrc_* RCs. NFC. (llvm#153711)

7ec2096

[RISCV][NFC] Make the pointer in the test case for llvm#153709 non-null

0f64ec8

The snippet was originally from llvm-reduce but we probably shouldn't use a null pointer in the actual test case. NFC.

[gn build] Port 769a905

0226e94

RuntimeLibcalls: Return StringRef for libcall names (llvm#153209)

cb1228f

Does not yet fully propagate this down into the TargetLowering uses, many of which are relying on null checks on the returned value.

[AMDGPU] Delete amdgpu-unify-metadata in optdriver.cpp (llvm#153717)

f2a6fcd

Fix up for llvm#153548, which is from llvm#153150.

andjo403 and others added 30 commits August 17, 2025 09:53

[SimplifyCfg] Handle trunc nuw i1 condition in Equality comparison. (l…

5ae8a9b

…lvm#153051) proof: https://alive2.llvm.org/ce/z/WVt4-F

[clang-tidy][NFC] Remove py2 conditions from clang-tidy scripts (llvm…

66a2d1b

…#154005)

[mlir][doc] fixup code block (llvm#153977)

a66d8f6

This fixes a small typo in the toy tutorial. A code block was not correctly terminated, causing it to run into the subsequent block.

[libc] Setup hdrgen for ioctl (llvm#153976)

71925a9

This patch adds some hdrgen yaml for ioctl(). Otherwise the function never actually ends up being available in a full build. This is the last thing that is needed to enable turning on LIBCXX_ENABLE_RANDOM_DEVICE.

[mlir][InferIntRangeCommon] Fix Division by Zero Crash (llvm#151637)

e1aa415

Fixes llvm#131273 Adds a check to avoid division when max value of denominator is zero.

[LVI] Add support for trunc nuw range. (llvm#154021)

0561ff6

Proof: https://alive2.llvm.org/ce/z/a5Yjb8

[clang-format] Don't annotate class property specifiers as StartOfName (

9a692e0

llvm#153525) Fixes llvm#153443

[clang-format] Allow breaking before bit-field colons (llvm#153529)

5e57a10

Fixes llvm#153448

[clang-format] Fix a bug in breaking before FunctionDeclarationName (l…

a21d17f

…lvm#153924) Fixes llvm#153891

[clang-format] Add SpaceInEmptyBraces option (llvm#153765)

6cfedea

Also set it to SIEB_Always for WebKit style. Closes llvm#85525. Closes llvm#93635.

[TableGen] Use structured binding in one place (NFC)

6947fb4

[VPlan] Remove dead code from GetBroadCastInstr (NFCI).

5892a2b

All relevant places should already explicitly materialize broadcasts. Remove dead code from VPTransformState::get

Reland "[libc][math][c23] Implement C23 math function asinpif16" (llv…

40833ee

…m#152690) llvm#146226 with fixing asinpi MPFR number function and make it work when mpfr < `4.2.0`

[libclc] Fix out-of-bound value for workitem functions according to O…

bce14c6

…penCL spec (llvm#153784)

MCSymbol: Remove setUndefined

34c7b7c

The name is misleading, as setting Fragment to nullptr does not necessarily make it undefined - common and equated symbols have a nullptr fragment as well.

[NFC][libclc] add missing __CLC_ prefix all internal macros (llvm#153523

76bb987

) This unifies naming scheme of macros to address review comment intel/llvm#19779 (comment) math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`

[Headers][X86] Allow pmuludq/pmuldq to be used in constexpr (llvm#153293

d42a1d4

) Adds `constexpr` support for `pmuludq` and `pmuldq` intrinsics. Closes llvm#153002. Part of llvm#30794.

Fix member-of field update, update some more tests.

9b1336c

Merge remote-tracking branch 'upstream/main' into map-ptr-array-secti…

b84885a

…on-using-attach-maptype

Fix use_device_ptr codegen and test.

e5d22be

Re-add missing non-contiguous hanlding code, update target update tests.

a4be53c

Update use_device_ptr/addr handling, add an error for p[0] case, move…

166e90e

… findattachptr to a common place to access from SemaOpenmp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Use ATTACH maps for array-sections/subscripts on pointers. #1

Uh oh!

abhinavgaba commented Jul 16, 2025 •

edited

Loading

Uh oh!

abhinavgaba Jul 16, 2025

Uh oh!

abhinavgaba Jul 23, 2025

Uh oh!

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Are you sure you want to change the base?

Use ATTACH maps for array-sections/subscripts on pointers. #1

Uh oh!

Conversation

abhinavgaba commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavgaba Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abhinavgaba commented Jul 16, 2025 •

edited

Loading