-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Problem Description
Hi,
To reproduce, run:
#include <stdio.h>
#include <hip/hip_runtime.h>
#include <iostream>
#define HIP_WARN(XXX) \
do { if (XXX != hipSuccess) std::cerr << "HIP Error: " << \
hipGetErrorString(XXX) << ", at line " << __LINE__ \
<< std::endl; hipDeviceSynchronize(); } while (0)
int main() {
int devCount;
HIP_WARN(hipGetDeviceCount(&devCount));
std::cout << "Number of devices: " << devCount << "\n";
int block_per_sm;
int thread_per_sm;
HIP_WARN(hipDeviceGetAttribute(&block_per_sm, hipDeviceAttributeMaxBlocksPerMultiProcessor, 0));
HIP_WARN(hipDeviceGetAttribute(&thread_per_sm, hipDeviceAttributeMaxThreadsPerMultiProcessor, 0));
std::cout << "Max blocks per CU: " << block_per_sm << "\n";
std::cout << "Max threads per CU: " << thread_per_sm << "\n";
}
hipDeviceAttributeMaxBlocksPerMultiProcessor
gives 2, but trying to estimate in a kernel the maximum number of active workgroups (see https://gist.github.com/Snektron/1fb62a39ee0d7b572c3441f0a53d310c), it seems clear that for workgroup size smaller than 1024 (say with workgroup sizes 64, 128, 256, 512), the number of workgroups scheduled per CU may be higher than 2.
The computation deviceProps.maxBlocksPerMultiProcessor = int(info.maxThreadsPerCU_ / info.maxWorkGroupSize_);
in https://github.com/ROCm/clr/blob/b8ba4ccf9c53f6558a5e369e3c1c05de97a0c28f/hipamd/src/hip_device.cpp#L496C77-L496C94 seems wrong.
What do you think?
Operating System
Ubuntu 24.04 LTS (Noble Numbat
CPU
AMD EPYC 73F3 16-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.2.4
ROCm Component
HIP
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response