Skip to content

[Issue] Unable to profile HIP application with RDP #3727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jeromew opened this issue Jan 22, 2025 · 5 comments
Open

[Issue] Unable to profile HIP application with RDP #3727

jeromew opened this issue Jan 22, 2025 · 5 comments

Comments

@jeromew
Copy link

jeromew commented Jan 22, 2025


  • window 11
  • Radeon 6700 XT
  • latest drivers
  • HIP SDK 6.2

I am sorry if this is not the right place to ask for help on this issue but it seems my problem is related to a HIP application so maybe there is some HIP knowledge that I am missing.

I have been trying to profile a HIP application with Radeon Developer Panel and I always get

(19:02:58.566) INFO [RGP Trace Source - PID: 15348] Client connected [6446 HIP]
(19:02:58.628) INFO [RGP Trace Source - PID: 15348] Client reached init state [6446 HIP]
(19:02:58.707) INFO [RGP Trace Source - PID: 15348] Successfully enabled tracing [6446 HIP]
(19:02:58.707) INFO [RGP Trace Source - PID: 15348] Initialized new client [6446 HIP]
(19:02:58.768) INFO [DDToolConn] Successfully initialized driver (connection id: 6446).
(19:02:58.897) INFO [RGP Trace Source - PID: 15348] Successfully queried SPM counters [6446 HIP]
(19:02:58.897) INFO [RGP Trace Source - PID: 15348] Successfully updated SPM counters [6446 HIP]
(19:02:59.148) INFO [RGP Trace Source - PID: 15348] Successfully began trace [6446 HIP]
(19:02:59.741) INFO [RGP Trace Source - PID: 15348] Client disconnected [6446 HIP]
(19:02:59.752) ERROR [RGP Trace Source - PID: 15348] Failed to capture trace [6446 HIP]
(19:03:00.739) INFO [RGP Trace Source - PID: 15348] Finished disconnecting client [6446 HIP]

The profiling seems to start but RDP fails to capture the trace.
This result is observed even with a very simple application like

#include <hip/hip_runtime.h>

#include <iostream>

#define HIP_CHECK(expression)                  \
{                                              \
    const hipError_t status = expression;      \
    if(status != hipSuccess){                  \
        std::cerr << "HIP error "              \
                  << status << ": "            \
                  << hipGetErrorString(status) \
                  << " at " << __FILE__ << ":" \
                  << __LINE__ << std::endl;    \
    }                                          \
}


__device__ unsigned int get_thread_idx()
{
    return threadIdx.x;
}

__host__ void print_hello_host()
{
    std::cout << "Hello world from host!" << std::endl;
}

__device__ __host__ void print_hello()
{
    printf("Hello world from device or host!\n");
}

__global__ void helloworld_kernel()
{
    unsigned int thread_idx = get_thread_idx();
    unsigned int block_idx = blockIdx.x;

    print_hello();

    printf("Hello world from device kernel block %u thread %u!\n", block_idx, thread_idx);
}

int main()
{
    print_hello_host();

    print_hello();

    helloworld_kernel<<<dim3(2), // 3D grid specifying number of blocks to launch: (2, 1, 1)
                        dim3(2), // 3D grid specifying number of threads to launch: (2, 1, 1)
                        0, // number of bytes of additional shared memory to allocate
                        hipStreamDefault // stream where the kernel should execute: default stream
                        >>>();

    HIP_CHECK(hipDeviceSynchronize());
}

Is there a specific flag or environment variable that needs to be set before calling hipcc in order to be able to capture the profiling trace ?
Is there something I could do to understand what is happening when the capture fails ?

@harkgill-amd
Copy link

Hi @jeromew, an internal ticket has been created to investigate this issue.

@jeromew
Copy link
Author

jeromew commented Jan 23, 2025

I can add that I tested profiling on my setup with a vulkan demo application called vkcube.exe found in the Vulkan SDK and the profiling capture works.

so I am inclined to think this is related to HIP / Compute despite the fact that the doc on https://gpuopen.com/manuals/rdp_manual/rdp_manual-index/ states

Compute APIs, RDNA hardware, and operating systems

Supported APIs
    OpenCL
    HIP

Supported RDNA hardware
    AMD Radeon RX 7000 series
    AMD Radeon RX 6000 series
    AMD Radeon RX 5000 series
    AMD Ryzen Processors with Radeon Graphics

Supported Operating Systems
    Windows® 10
    Windows® 11

so it should pass the Windows 11 / Radeon RX 6700 XT / HIP combination

@schung-amd
Copy link

Hi @jeromew, thanks for reporting this! I was able to reproduce the issue with the HIP SDK matrix transpose sample on a 7900XTX. I believe this should be supported and we're looking into it.

@jeromew
Copy link
Author

jeromew commented Feb 4, 2025

@schung-amd for information I managed to get some data out of the profiling pipeline.

It seems that if in RDP you request more dispatches than your application really has then the capture fails. Trying to get only 1 dispatch I could start getting a profile (but not always).

The next thing is that when the application exits, it seems that RDP does not properly terminate the profiling if it was not finished earlier. Adding a Sleep for 20 seconds at the end of the application seem to allow RDP to gracefully finish the profiling.

These constraints are not very well documented / not mentioned in the FAQ.

@schung-amd
Copy link

Interesting, thanks for looking further into this! I'll see if the internal team is aware of this and if we should fix this in code or documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants