Skip to content

fix(gfx): copy subExecParam from device before reading timing data#323

Open
fcui-amd wants to merge 1 commit into
developfrom
users/fcui/gfx-subexecparam-host-deref-segfault
Open

fix(gfx): copy subExecParam from device before reading timing data#323
fcui-amd wants to merge 1 commit into
developfrom
users/fcui/gfx-subexecparam-host-deref-segfault

Conversation

@fcui-amd

Copy link
Copy Markdown

Motivation

The GFX executor crashes with a SIGSEGV when collecting per-Transfer timing
data on GPUs that do not expose VRAM to the host (e.g. PCIe cards without
large/resizable BAR).

Technical Details

Copy the subExecutor parameters back to host with hipMemcpy(..., DeviceToHost)
before reading timing/CU data, in both affected paths. No behavior change on
hardware where direct access previously worked.

Test Plan

Test Result

Submission Checklist

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash in the GPU GFX executor when collecting per-Transfer/per-iteration timing and CU data on systems where VRAM is not host-accessible (e.g., no large/resizable BAR) by copying SubExecParam back from device before reading fields on the host.

Changes:

  • Copy rss.subExecParamGpuPtr from device to host before collecting per-iteration CU IDs in multi-stream mode.
  • Copy exeInfo.subExecParamGpu from device to host before computing per-Transfer timing/CU IDs in single-stream mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
GFX executor read startCycle/stopCycle/xccId/hwId directly from
subExecParamGpu (device memory allocated via hipMalloc). This assumed
VRAM is host-accessible, which fails on GPUs without large/resizable
BAR (e.g. PCIe cards), causing SIGSEGV. Now copy params back to host
via hipMemcpy before reading in both single-launch and multistream
paths.

Signed-off-by: Flora Cui <flora.cui@amd.com>
@fcui-amd fcui-amd force-pushed the users/fcui/gfx-subexecparam-host-deref-segfault branch from 674a070 to 45d6c97 Compare June 13, 2026 05:29
@fcui-amd

Copy link
Copy Markdown
Author

ping?

@fcui-amd fcui-amd requested a review from AtlantaPepsi June 18, 2026 08:06
@gilbertlee-amd

Copy link
Copy Markdown
Collaborator

This was somewhat intentional, to try to avoid explicit SDMA traffic during the main loop.

The solution for non-AMD hardware was actually to use [managed memory instead] for the GPU parameter buffer.

MemType memType = MEM_MANAGED; // NVIDIA hardware requires managed memory to access from host

I think this PR needs to be more targeted, and only be applied to GPUs that require this functionality.
Does isLargeBar report 0 for your device?

@fcui-amd

Copy link
Copy Markdown
Author

It's amd gpu with large bar disabled.

Comment on lines +5386 to +5391
// Copy subExecutor parameters back to host before reading them.
// subExecParamGpuPtr lives in device memory and is not guaranteed to be host-accessible
// (e.g. GPUs without large/resizable BAR), so dereferencing it directly can segfault.
std::vector<SubExecParam> subExecParamHost(numSubExecs);
ERR_CHECK(hipMemcpy(subExecParamHost.data(), rss.subExecParamGpuPtr,
numSubExecs * sizeof(SubExecParam), hipMemcpyDefault));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be gated only for GPUs that do not have largeBar support, so that it doesn't change the behavior of GPUs that do have largeBar support.

Comment on lines +5461 to +5467
// Copy subExecutor parameters back to host before reading timing data.
// subExecParamGpu lives in device memory and is not guaranteed to be host-accessible
// (e.g. GPUs without large/resizable BAR), so dereferencing it directly can segfault.
std::vector<SubExecParam> subExecParamHost(exeInfo.totalSubExecs);
ERR_CHECK(hipMemcpy(subExecParamHost.data(), exeInfo.subExecParamGpu,
exeInfo.totalSubExecs * sizeof(SubExecParam), hipMemcpyDefault));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Gate this copy based on whether or not this GPU has largeBar support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants