Skip to content

Fix #29: avoid index out of range when --query-compute-apps returns m…#42

Open
deepujain wants to merge 1 commit intoeBay:masterfrom
deepujain:issue-29
Open

Fix #29: avoid index out of range when --query-compute-apps returns m…#42
deepujain wants to merge 1 commit intoeBay:masterfrom
deepujain:issue-29

Conversation

@deepujain
Copy link
Copy Markdown
Contributor

Fixes #29. When using --query-compute-apps, nvidia-smi returns one CSV row per process (not per GPU). If there are more processes than GPUs (e.g. several processes on one card), the beat panicked with index out of range [4] with length 4 because events were allocated as events[gpuCount] and indexed by row number. This change builds the events slice with append so the number of events matches the number of rows (one event per process).

Changes made

  • nvidia/gpu.go
    In run(), replaced fixed-size events := make([]common.MapStr, gpuCount, 2gpuCount) with events := make([]common.MapStr, 0, gpuCount2) and events = append(events, event) instead of events[gpuIndex] = event.
  • Added a short comment that --query-gpu is one row per GPU and --query-compute-apps is one row per process.

Request to eBay maintainer
Please review and merge when convenient. Thank you.

…ns more rows than GPUs

For --query-compute-apps, nvidia-smi returns one row per process (multiple
processes can run on one GPU). The code assumed one row per GPU and allocated
events[gpuCount], causing panic when processes > gpuCount. Use append so
events slice grows with the actual number of rows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can nvidiagpubeat be made to also export the process running on each card?

1 participant