[Offload] Make olLaunchKernel test thread safe #149497

RossBrunton · 2025-07-18T11:29:51Z

This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).

This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).

llvmbot · 2025-07-18T11:30:22Z

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Ross Brunton (RossBrunton)

Changes

This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).

Full diff: https://github.com/llvm/llvm-project/pull/149497.diff

9 Files Affected:

(modified) offload/include/Shared/APITypes.h (+4)
(modified) offload/liboffload/src/OffloadImpl.cpp (+1-7)
(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+11-3)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+6-2)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+5-2)
(modified) offload/plugins-nextgen/cuda/src/rtl.cpp (+11-4)
(modified) offload/plugins-nextgen/host/src/rtl.cpp (+2-1)
(modified) offload/unittests/OffloadAPI/common/Fixtures.hpp (+18)
(modified) offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp (+23)

diff --git a/offload/include/Shared/APITypes.h b/offload/include/Shared/APITypes.h
index 978b53d5d69b9..a988edce481e6 100644
--- a/offload/include/Shared/APITypes.h
+++ b/offload/include/Shared/APITypes.h
@@ -21,6 +21,7 @@
 
 #include <cstddef>
 #include <cstdint>
+#include <mutex>
 
 extern "C" {
 
@@ -75,6 +76,9 @@ struct __tgt_async_info {
   /// should be freed after finalization.
   llvm::SmallVector<void *, 2> AssociatedAllocations;
 
+  /// Mutex to guard access to AssociatedAllocations
+  std::mutex AllocationsMutex;
+
   /// The kernel launch environment used to issue a kernel. Stored here to
   /// ensure it is a valid location while the transfer to the device is
   /// happening.
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index ffc9016bca0a3..d0dced8be7a61 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -487,16 +487,10 @@ Error olWaitQueue_impl(ol_queue_handle_t Queue) {
   // Host plugin doesn't have a queue set so it's not safe to call synchronize
   // on it, but we have nothing to synchronize in that situation anyway.
   if (Queue->AsyncInfo->Queue) {
-    if (auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo))
+    if (auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo, false))
       return Err;
   }
 
-  // Recreate the stream resource so the queue can be reused
-  // TODO: Would be easier for the synchronization to (optionally) not release
-  // it to begin with.
-  if (auto Res = Queue->Device->Device->initAsyncInfo(&Queue->AsyncInfo))
-    return Res;
-
   return Error::success();
 }
 
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index b2fd950c9d500..509f6c03e21fe 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2227,6 +2227,7 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
   /// Get the stream of the asynchronous info structure or get a new one.
   Error getStream(AsyncInfoWrapperTy &AsyncInfoWrapper,
                   AMDGPUStreamTy *&Stream) {
+    std::lock_guard<std::mutex> StreamLock{StreamMutex};
     // Get the stream (if any) from the async info.
     Stream = AsyncInfoWrapper.getQueueAs<AMDGPUStreamTy *>();
     if (!Stream) {
@@ -2291,7 +2292,8 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
   }
 
   /// Synchronize current thread with the pending operations on the async info.
-  Error synchronizeImpl(__tgt_async_info &AsyncInfo) override {
+  Error synchronizeImpl(__tgt_async_info &AsyncInfo,
+                        bool RemoveQueue) override {
     AMDGPUStreamTy *Stream =
         reinterpret_cast<AMDGPUStreamTy *>(AsyncInfo.Queue);
     assert(Stream && "Invalid stream");
@@ -2302,8 +2304,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     // Once the stream is synchronized, return it to stream pool and reset
     // AsyncInfo. This is to make sure the synchronization only works for its
     // own tasks.
-    AsyncInfo.Queue = nullptr;
-    return AMDGPUStreamManager.returnResource(Stream);
+    if (RemoveQueue) {
+      AsyncInfo.Queue = nullptr;
+      return AMDGPUStreamManager.returnResource(Stream);
+    }
+    return Plugin::success();
   }
 
   /// Query for the completion of the pending operations on the async info.
@@ -3013,6 +3018,9 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
   /// True is the system is configured with XNACK-Enabled.
   /// False otherwise.
   bool IsXnackEnabled = false;
+
+  /// Mutex to guard getting/setting the stream
+  std::mutex StreamMutex;
 };
 
 Error AMDGPUDeviceImageTy::loadExecutable(const AMDGPUDeviceTy &Device) {
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 162b149ab483e..5fd34a9236f83 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -104,6 +104,7 @@ struct AsyncInfoWrapperTy {
   /// Register \p Ptr as an associated allocation that is freed after
   /// finalization.
   void freeAllocationAfterSynchronization(void *Ptr) {
+    std::lock_guard<std::mutex> AllocationGuard{AsyncInfoPtr->AllocationsMutex};
     AsyncInfoPtr->AssociatedAllocations.push_back(Ptr);
   }
 
@@ -772,8 +773,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
 
   /// Synchronize the current thread with the pending operations on the
   /// __tgt_async_info structure.
-  Error synchronize(__tgt_async_info *AsyncInfo);
-  virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
+  Error synchronize(__tgt_async_info *AsyncInfo, bool RemoveQueue = true);
+  virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo,
+                                bool RemoveQueue) = 0;
 
   /// Invokes any global constructors on the device if present and is required
   /// by the target.
@@ -1501,6 +1503,8 @@ template <typename ResourceRef> class GenericDeviceResourceManagerTy {
   /// Deinitialize the resource pool and delete all resources. This function
   /// must be called before the destructor.
   virtual Error deinit() {
+    const std::lock_guard<std::mutex> Lock(Mutex);
+
     if (NextAvailable)
       DP("Missing %d resources to be returned\n", NextAvailable);
 
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 81b9d423e13d8..4844f88229fb2 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -1329,12 +1329,15 @@ Error PinnedAllocationMapTy::unlockUnmappedHostBuffer(void *HstPtr) {
   return eraseEntry(*Entry);
 }
 
-Error GenericDeviceTy::synchronize(__tgt_async_info *AsyncInfo) {
+Error GenericDeviceTy::synchronize(__tgt_async_info *AsyncInfo,
+                                   bool RemoveQueue) {
+  std::lock_guard<std::mutex> AllocationGuard{AsyncInfo->AllocationsMutex};
+
   if (!AsyncInfo || !AsyncInfo->Queue)
     return Plugin::error(ErrorCode::INVALID_ARGUMENT,
                          "invalid async info queue");
 
-  if (auto Err = synchronizeImpl(*AsyncInfo))
+  if (auto Err = synchronizeImpl(*AsyncInfo, RemoveQueue))
     return Err;
 
   for (auto *Ptr : AsyncInfo->AssociatedAllocations)
diff --git a/offload/plugins-nextgen/cuda/src/rtl.cpp b/offload/plugins-nextgen/cuda/src/rtl.cpp
index b787376eb1770..f637379c5b29d 100644
--- a/offload/plugins-nextgen/cuda/src/rtl.cpp
+++ b/offload/plugins-nextgen/cuda/src/rtl.cpp
@@ -522,6 +522,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
   /// Get the stream of the asynchronous info structure or get a new one.
   Error getStream(AsyncInfoWrapperTy &AsyncInfoWrapper, CUstream &Stream) {
+    std::lock_guard<std::mutex> StreamLock{StreamMutex};
     // Get the stream (if any) from the async info.
     Stream = AsyncInfoWrapper.getQueueAs<CUstream>();
     if (!Stream) {
@@ -642,7 +643,8 @@ struct CUDADeviceTy : public GenericDeviceTy {
   }
 
   /// Synchronize current thread with the pending operations on the async info.
-  Error synchronizeImpl(__tgt_async_info &AsyncInfo) override {
+  Error synchronizeImpl(__tgt_async_info &AsyncInfo,
+                        bool RemoveQueue) override {
     CUstream Stream = reinterpret_cast<CUstream>(AsyncInfo.Queue);
     CUresult Res;
     Res = cuStreamSynchronize(Stream);
@@ -650,9 +652,11 @@ struct CUDADeviceTy : public GenericDeviceTy {
     // Once the stream is synchronized, return it to stream pool and reset
     // AsyncInfo. This is to make sure the synchronization only works for its
     // own tasks.
-    AsyncInfo.Queue = nullptr;
-    if (auto Err = CUDAStreamManager.returnResource(Stream))
-      return Err;
+    if (RemoveQueue) {
+      AsyncInfo.Queue = nullptr;
+      if (auto Err = CUDAStreamManager.returnResource(Stream))
+        return Err;
+    }
 
     return Plugin::check(Res, "error in cuStreamSynchronize: %s");
   }
@@ -1281,6 +1285,9 @@ struct CUDADeviceTy : public GenericDeviceTy {
   /// The maximum number of warps that can be resident on all the SMs
   /// simultaneously.
   uint32_t HardwareParallelism = 0;
+
+  /// Mutex to guard getting/setting the stream
+  std::mutex StreamMutex;
 };
 
 Error CUDAKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
diff --git a/offload/plugins-nextgen/host/src/rtl.cpp b/offload/plugins-nextgen/host/src/rtl.cpp
index d950572265b4c..725a37c280248 100644
--- a/offload/plugins-nextgen/host/src/rtl.cpp
+++ b/offload/plugins-nextgen/host/src/rtl.cpp
@@ -297,7 +297,8 @@ struct GenELF64DeviceTy : public GenericDeviceTy {
 
   /// All functions are already synchronous. No need to do anything on this
   /// synchronization function.
-  Error synchronizeImpl(__tgt_async_info &AsyncInfo) override {
+  Error synchronizeImpl(__tgt_async_info &AsyncInfo,
+                        bool RemoveQueue) override {
     return Plugin::success();
   }
 
diff --git a/offload/unittests/OffloadAPI/common/Fixtures.hpp b/offload/unittests/OffloadAPI/common/Fixtures.hpp
index 546921164f691..4fe57bd80d704 100644
--- a/offload/unittests/OffloadAPI/common/Fixtures.hpp
+++ b/offload/unittests/OffloadAPI/common/Fixtures.hpp
@@ -9,6 +9,7 @@
 #include <OffloadAPI.h>
 #include <OffloadPrint.hpp>
 #include <gtest/gtest.h>
+#include <thread>
 
 #include "Environment.hpp"
 
@@ -57,6 +58,23 @@ inline std::string SanitizeString(const std::string &Str) {
   return NewStr;
 }
 
+template <typename Fn> inline void threadify(Fn body) {
+  std::vector<std::thread> Threads;
+  for (size_t I = 0; I < 20; I++) {
+    Threads.emplace_back(
+        [&body](size_t I) {
+          std::string ScopeMsg{"Thread #"};
+          ScopeMsg.append(std::to_string(I));
+          SCOPED_TRACE(ScopeMsg);
+          body(I);
+        },
+        I);
+  }
+  for (auto &T : Threads) {
+    T.join();
+  }
+}
+
 struct OffloadTest : ::testing::Test {
   ol_device_handle_t Host = TestEnvironment::getHostDevice();
 };
diff --git a/offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp b/offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp
index e7e608f2a64d4..3e128d1e84645 100644
--- a/offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp
+++ b/offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp
@@ -104,6 +104,29 @@ TEST_P(olLaunchKernelFooTest, Success) {
   ASSERT_SUCCESS(olMemFree(Mem));
 }
 
+TEST_P(olLaunchKernelFooTest, SuccessThreaded) {
+  threadify([&](size_t) {
+    void *Mem;
+    ASSERT_SUCCESS(olMemAlloc(Device, OL_ALLOC_TYPE_MANAGED,
+                              LaunchArgs.GroupSize.x * sizeof(uint32_t), &Mem));
+    struct {
+      void *Mem;
+    } Args{Mem};
+
+    ASSERT_SUCCESS(olLaunchKernel(Queue, Device, Kernel, &Args, sizeof(Args),
+                                  &LaunchArgs, nullptr));
+
+    ASSERT_SUCCESS(olWaitQueue(Queue));
+
+    uint32_t *Data = (uint32_t *)Mem;
+    for (uint32_t i = 0; i < 64; i++) {
+      ASSERT_EQ(Data[i], i);
+    }
+
+    ASSERT_SUCCESS(olMemFree(Mem));
+  });
+}
+
 TEST_P(olLaunchKernelNoArgsTest, Success) {
   ASSERT_SUCCESS(
       olLaunchKernel(Queue, Device, Kernel, nullptr, 0, &LaunchArgs, nullptr));

RossBrunton · 2025-07-18T11:31:45Z

offload/unittests/OffloadAPI/kernel/olLaunchKernel.cpp

@@ -104,6 +104,29 @@ TEST_P(olLaunchKernelFooTest, Success) {
  ASSERT_SUCCESS(olMemFree(Mem));
 }

+TEST_P(olLaunchKernelFooTest, SuccessThreaded) {


I'd love to be able to add an OFFLOAD_TEST_THREADED_P macro so that you'd get threaded and non-threaded tests "for free" without copy-pasting the test body. But I can't think of a good way of actually implementing that with gtest, anyone have any ideas?

jhuber6 · 2025-07-18T13:32:18Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

@@ -2227,6 +2227,7 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
  /// Get the stream of the asynchronous info structure or get a new one.
  Error getStream(AsyncInfoWrapperTy &AsyncInfoWrapper,
                  AMDGPUStreamTy *&Stream) {
+    std::lock_guard<std::mutex> StreamLock{StreamMutex};


Do we only need this when we create a new one?

Multiple threads can call getStream, see that the stream doesn't exist and create a new one. This can result in multiple streams being created in error.

I'm not sure about this function scope lock. Sure, getStream can be called by multiple threads but I don't think it should be the responsibility of getStream for thread safety. I suppose AMDGPUStreamManager.getResource needs to be the one to do it.

I have several comments about this function:

The resource managers should already be thread safe, they are acquiring an std::mutex when retrieving/releasing resources. E.g., GenericDeviceResourceManagerTy::getResourcesImpl.

The scope of this mutex seems too coarse-grain for the objective of this PR. My understanding is that you want to protect the set/unset of the queue in a async info object. But the StreamMutex here is placed in the device object. Thus, you are actually limiting the concurrency, apparently unnecessary, of threads that process different async infos from the same device. Wouldn't make more sense to move it to the async info instead?

If the previous point is correct, can't you use the same AllocationsMutex instead (after renaming it)?

jhuber6 · 2025-07-18T13:33:05Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

@@ -2302,8 +2304,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
    // Once the stream is synchronized, return it to stream pool and reset
    // AsyncInfo. This is to make sure the synchronization only works for its
    // own tasks.
-    AsyncInfo.Queue = nullptr;
-    return AMDGPUStreamManager.returnResource(Stream);
+    if (RemoveQueue) {


Why do we now need a conditional for this? It's supposed to consume it.

Liboffload contains this:

Error olWaitQueue_impl(ol_queue_handle_t Queue) { // Host plugin doesn't have a queue set so it's not safe to call synchronize // on it, but we have nothing to synchronize in that situation anyway. if (Queue->AsyncInfo->Queue) { if (auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo, false)) return Err; } // Recreate the stream resource so the queue can be reused // TODO: Would be easier for the synchronization to (optionally) not release // it to begin with. if (auto Res = Queue->Device->Device->initAsyncInfo(&Queue->AsyncInfo)) return Res; return Error::success(); }

This has to be done atomically so that, for example, we don't try to synchronise an absent queue. I could add a mutex to ol_queue_impl_t, but I figured it'd be better to just implement what the comment says. Specifically, we avoid dropping the AsyncInfo just to immediately recreate it right after.

I thought the whole point of the resource managers we used was to make acquiring / releasing resources cheap. @kevinsala was the one to implement this originally so I'll see if he knows the proper approach here.

Sure, but we still have the race condition. Tweaking the interface here allows us to have the lock cover a smaller section of code.

kevinsala · 2025-07-21T15:23:56Z

offload/liboffload/src/OffloadImpl.cpp

@@ -487,16 +487,10 @@ Error olWaitQueue_impl(ol_queue_handle_t Queue) {
  // Host plugin doesn't have a queue set so it's not safe to call synchronize
  // on it, but we have nothing to synchronize in that situation anyway.
  if (Queue->AsyncInfo->Queue) {
-    if (auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo))
+    if (auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo, false))


Please indicate with a comment what's the false doing.

This code assumes other threads will not release the queue from that async info, right?

kevinsala · 2025-07-21T17:07:28Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

@@ -2227,6 +2227,7 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
  /// Get the stream of the asynchronous info structure or get a new one.
  Error getStream(AsyncInfoWrapperTy &AsyncInfoWrapper,
                  AMDGPUStreamTy *&Stream) {
+    std::lock_guard<std::mutex> StreamLock{StreamMutex};


I have several comments about this function:

The resource managers should already be thread safe, they are acquiring an std::mutex when retrieving/releasing resources. E.g., GenericDeviceResourceManagerTy::getResourcesImpl.

The scope of this mutex seems too coarse-grain for the objective of this PR. My understanding is that you want to protect the set/unset of the queue in a async info object. But the StreamMutex here is placed in the device object. Thus, you are actually limiting the concurrency, apparently unnecessary, of threads that process different async infos from the same device. Wouldn't make more sense to move it to the async info instead?

If the previous point is correct, can't you use the same AllocationsMutex instead (after renaming it)?

kevinsala · 2025-07-21T19:24:16Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

@@ -2302,8 +2304,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
    // Once the stream is synchronized, return it to stream pool and reset
    // AsyncInfo. This is to make sure the synchronization only works for its
    // own tasks.


Please update the comment as appropriate.

kevinsala · 2025-07-21T19:39:35Z

offload/plugins-nextgen/common/include/PluginInterface.h

@@ -772,8 +773,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {

  /// Synchronize the current thread with the pending operations on the
  /// __tgt_async_info structure.
-  Error synchronize(__tgt_async_info *AsyncInfo);
-  virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
+  Error synchronize(__tgt_async_info *AsyncInfo, bool RemoveQueue = true);


Probably ReleaseQueue is more descriptive of what it's doing.

kevinsala · 2025-07-21T20:04:25Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

-Error GenericDeviceTy::synchronize(__tgt_async_info *AsyncInfo) {
+Error GenericDeviceTy::synchronize(__tgt_async_info *AsyncInfo,
+                                   bool RemoveQueue) {
+  std::lock_guard<std::mutex> AllocationGuard{AsyncInfo->AllocationsMutex};


Please use syntax std::lock_guard<std::mutex> AllocationGuard(...); (i.e., with (...)) as other occurrences in the plugins.

I understand you need the lock covering the synchronize + delete of allocations to avoid deleting allocations that correspond to other kernel launches not yet synchronized (issued by other threads), right? In other words, to avoid this case:

Thread1 ----------------------------------------------------------------- Time ----> Add allocation --> Launch kernel Thread2 ----------------------------------------------------------------- Time ----> Synchronize --> Delete allocations

kevinsala · 2025-07-21T20:37:34Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

  if (!AsyncInfo || !AsyncInfo->Queue)
    return Plugin::error(ErrorCode::INVALID_ARGUMENT,
                         "invalid async info queue");

-  if (auto Err = synchronizeImpl(*AsyncInfo))
+  if (auto Err = synchronizeImpl(*AsyncInfo, RemoveQueue))
    return Err;

  for (auto *Ptr : AsyncInfo->AssociatedAllocations)


I'm worried about all this code (line 1343 to 1346) being inside the lock. Delete operations may take significant time. What about creating a llvm::SmallVector Ptrs (with reasonable static size) in the stack, transferring all the allocation pointers from AssociatedAllocations to the temporary vector Ptrs, and then, outside the lock, deleting the allocations that are present in Ptrs?

Something like this pseudocode:

void synchronize(...) { SmallVector<void *, 10> Ptrs; { std::lock_guard<...> AllocationGuard(...); synchronizeImpl(AsyncInfo, ...); Ptrs = move_elements(AsyncInfo->AssociatedAllocations); } for (Ptr : Ptrs) dataDelete(Ptr, ...); }

kevinsala · 2025-07-21T20:38:23Z

offload/plugins-nextgen/cuda/src/rtl.cpp

-    if (auto Err = CUDAStreamManager.returnResource(Stream))
-      return Err;
+    if (RemoveQueue) {
+      AsyncInfo.Queue = nullptr;


When does the queue gets unset/released for liboffload queues?

kevinsala · 2025-07-21T20:42:29Z

offload/include/Shared/APITypes.h

@@ -75,6 +76,9 @@ struct __tgt_async_info {
  /// should be freed after finalization.
  llvm::SmallVector<void *, 2> AssociatedAllocations;

+  /// Mutex to guard access to AssociatedAllocations
+  std::mutex AllocationsMutex;


Would it make sense to construct __tgt_async_info with or without mutex logic? I understand this mutex is only required for the liboffload use-case, not for libomptarget. Having the mutex here doesn't seem like a problem, but maybe we could have a constant boolean field indicating if operations with this async info require mutex protection or not.

llvmbot added backend:AMDGPU offload labels Jul 18, 2025

RossBrunton commented Jul 18, 2025

View reviewed changes

RossBrunton requested review from callumfare and jhuber6 and removed request for callumfare July 18, 2025 11:32

Remove unneeded lock

624d6ec

RossBrunton marked this pull request as draft July 18, 2025 12:39

Fix convertable

d4894b3

RossBrunton marked this pull request as ready for review July 18, 2025 13:10

jhuber6 reviewed Jul 18, 2025

View reviewed changes

kevinsala reviewed Jul 21, 2025

View reviewed changes

[Offload] Make olLaunchKernel test thread safe #149497

Are you sure you want to change the base?

[Offload] Make olLaunchKernel test thread safe #149497

Conversation

RossBrunton commented Jul 18, 2025

Uh oh!

llvmbot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jul 18, 2025 •

edited

Loading