Fix vllm gpu arch detection#120
Conversation
📝 WalkthroughWalkthroughAdds vLLM AOT compile cache detection and metadata, expands cache format support to include "aot_compile", augments GPU probing to detect actual runtime backend/toolkit/ptx, normalizes architecture comparisons, updates preflight validation for three cache formats, and substantially expands vLLM cache documentation and workflow guidance. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Detector as Cache Detection (mcv/pkg/cache/vllm.go)
participant GPUProbe as GPU Probe (detectActualGPUInfo / nvml / rocm / amd)
participant Summary as Summary Builder (vllm.go)
participant Preflight as Preflight Validation (mcv/pkg/preflightcheck)
User->>Detector: DetectVLLMCache()
Detector->>Detector: scan torch_compile_cache/ and torch_aot_compile/
alt aot_compile found
Detector->>Detector: detectAOTCompileCache() → collect AOT entries grouped by hash
Detector-->>User: VLLMCacheMetadata{cacheFormat:"aot_compile", aot_entries}
else binary/triton found
Detector->>Summary: buildBinaryCacheSummary()
Summary->>GPUProbe: detectActualGPUInfo()
GPUProbe-->>Summary: backend, arch, warpSize, ptxVersion, cuda/rocm versions
Summary-->>Detector: annotated target summaries
Detector-->>User: VLLMCacheMetadata[] (binary/triton)
end
User->>Preflight: CompareVLLMCacheManifestToGPU(manifest)
Preflight->>Preflight: switch on manifest.CacheFormat
alt aot_compile
Preflight->>Preflight: compareAOTCompileCacheEntriesToGPU() (presence/size checks)
else binary
Preflight->>Preflight: compareBinaryCacheEntriesToGPU() (backend + warp-size check)
else triton
Preflight->>Preflight: CompareTritonEntriesToGPU() (normalized arch compare)
end
Preflight-->>User: validation result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
mcv/pkg/preflightcheck/vllm.go (2)
119-124: Use logging instead of fmt.Printf for consistency.Same issue as above - use
logging.Debugffor consistency with the rest of the codebase.♻️ Suggested fix
if backendMatches && warpMatches { matched = true // For detailed arch compatibility, rely on Summary label check - fmt.Printf("Binary cache entry matches GPU: backend=%s, warpSize=%d\n", - backend, expectedWarpSize) + logging.Debugf("Binary cache entry matches GPU: backend=%s, warpSize=%d", + backend, expectedWarpSize) break }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/preflightcheck/vllm.go` around lines 119 - 124, The code uses fmt.Printf in the GPU-matching branch (inside the block checking backendMatches and warpMatches where matched is set) which breaks logging consistency; replace the fmt.Printf call that prints "Binary cache entry matches GPU: backend=%s, warpSize=%d\n" with logging.Debugf using the same formatted message and variables (backend, expectedWarpSize), keeping the matched=true and break behavior unchanged and ensuring any fmt import is removed if no longer used.
76-80: Use logging instead of fmt.Printf for consistency.The rest of the codebase uses the
loggingpackage. Usingfmt.Printfhere is inconsistent and won't respect log level settings.♻️ Suggested fix
// Log the AOT cache entries for debugging for _, entry := range entries { - fmt.Printf("AOT compile cache: hash=%s, rank=%s, size=%d bytes\n", - entry.Hash, entry.Rank, entry.FileSize) + logging.Debugf("AOT compile cache: hash=%s, rank=%s, size=%d bytes", + entry.Hash, entry.Rank, entry.FileSize) }Note: You'll need to add the logging import.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/preflightcheck/vllm.go` around lines 76 - 80, Replace the fmt.Printf usage in the AOT cache logging loop with the project's logging package: locate the loop that iterates over entries (for _, entry := range entries) and change the print to use logging.<level> (e.g., logging.Debug or logging.Infof depending on desired verbosity) and include entry.Hash, entry.Rank and entry.FileSize in the log message; also add the logging import to the file. Ensure the message format and log level are consistent with other uses of logging in this package.mcv/pkg/cache/vllm.go (2)
520-523: Move GPU detection outside the inner loop to avoid redundant calls.
detectActualGPUInfo()is called once perVLLMCacheMetadataentry, but it performs expensive operations (config initialization, device registry access, device startup). Since GPU info doesn't change between iterations, move the call outside both loops.♻️ Suggested fix
func buildBinaryCacheSummary(metadata []VLLMCacheMetadata) (*Summary, error) { targetMap := make(map[string]SummaryTargetInfo) + + // Detect actual GPU from the system once (not per entry) + // NOTE: We detect the actual system GPU rather than trusting VLLM_TARGET_DEVICE + // because caches may be copied from other systems + detectedBackend, detectedArch, detectedWarpSize, detectedPTX := detectActualGPUInfo() for _, meta := range metadata { if meta.CacheFormat != BinaryCacheFormat { continue } - // Detect actual GPU from the system once per metadata entry - // NOTE: We detect the actual system GPU rather than trusting VLLM_TARGET_DEVICE - // because caches may be copied from other systems - detectedBackend, detectedArch, detectedWarpSize, detectedPTX := detectActualGPUInfo() - for i := range meta.BinaryCacheEntries {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` around lines 520 - 523, The code currently calls detectActualGPUInfo() inside the loop that iterates VLLMCacheMetadata, causing expensive duplicate GPU detection; move the call so it runs once before entering the outer/inner loops (call detectActualGPUInfo() once and store detectedBackend, detectedArch, detectedWarpSize, detectedPTX in local variables) and then use those stored values inside the loop(s) instead of calling detectActualGPUInfo() repeatedly; update any references in the loop to use the precomputed variables and remove the inner detectActualGPUInfo() invocation.
350-350: Move regex compilation outside the loop.The
rankDirRegexis recompiled on every iteration of the outer loop. Move it to a package-level variable or outside the loop.♻️ Suggested fix
+var aotRankDirRegex = regexp.MustCompile(`^rank_\d+_\d+$`) + func detectAOTCompileCache(aotPath string) ([]AOTCompileCacheMetadata, error) { var aotCaches []AOTCompileCacheMetadata // ... for _, hashEntry := range entries { // ... - rankDirRegex := regexp.MustCompile(`^rank_\d+_\d+$`) for _, rankEntry := range rankEntries { // ... - if !rankDirRegex.MatchString(rankEntry.Name()) { + if !aotRankDirRegex.MatchString(rankEntry.Name()) { continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` at line 350, The regex rankDirRegex is being compiled inside a loop causing repeated allocations; move its compilation out of the loop (e.g., declare a package-level var or compute once before the loop) and reuse that compiled *regexp. Ensure you replace the inline regexp.MustCompile(`^rank_\d+_\d+$`) inside the loop with the shared variable (referencing rankDirRegex) used by the functions/methods in vllm.go that iterate directories.mcv/docs/vllm-binary-cache.md (1)
747-747: Add language specifier to fenced code block.The static analysis flagged this code block as missing a language specifier.
♻️ Suggested fix
-``` +```text ┌─────────────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/docs/vllm-binary-cache.md` at line 747, The fenced code block containing the ASCII art is missing a language specifier; update the opening fence from ``` to ```text (or another appropriate language) for the ASCII art block in vllm-binary-cache.md so static analysis recognizes it — locate the ASCII-art fenced block (the triple-backtick surrounding "┌─────────────────────────┐") and change the opening fence to include "text".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Around line 331-344: The docs describe the "arch" field as a numeric compute
capability (e.g., "75") but nvml.go now outputs the CUDA SM string with an "sm_"
prefix (e.g., "sm_75"); update the documentation text and all examples to state
that arch contains the SM string (format "sm_<nnn>"), change illustrative
examples from "75" to "sm_75" (and similar for other GPUs), and mention that
this format matches nvml.go's output and NVIDIA convention so readers know to
expect the "sm_" prefix when reading or comparing the arch field.
- Line 691: The docs contain the literal header string "ERRO" which triggers
codespell; if the log format permits, update the header string "ERRO Preflight
check failed: no compatible GPU found" (and the other occurrence around the 707
example) to "ERROR Preflight check failed: no compatible GPU found"; if the
header must remain as-is for fidelity, add "ERRO" to the codespell ignore list
(or adjust the example formatting) so codespell no longer flags the literal
"ERRO" token in mcv/docs/vllm-binary-cache.md.
In `@mcv/pkg/accelerator/devices/nvml.go`:
- Line 225: Normalize GPU architecture strings at comparison sites instead of
changing stored cache values: update CompareTritonCacheToGPU to compare
ConvertArchToString(cacheData.Target.Arch) against a normalized form of gpu.Arch
(strip any "sm_" prefix or non-digit chars), and apply the same normalization in
preflight checks where entry.Arch == gpuInfo.Arch (in
mcv/pkg/preflightcheck/triton.go) and target.Arch == gpu.Arch (in
mcv/pkg/preflightcheck/utils.go); implement a small helper (e.g.,
NormalizeArchString or NormalizeGPUArch) and call it in those comparisons so
legacy cache values like "75" match new detector output "sm_75".
In `@mcv/pkg/cache/vllm.go`:
- Around line 538-557: When backend == "unknown" in the fallback block, also
populate the arch variable from cache metadata so downstream comparisons in
CompareCacheSummaryLabelToGPU don't all fail; retrieve arch from the binary
cache metadata (e.g., binaryCache.Arch) or read environment keys like
VLLM_PAGED_ATTN_ARCH (or any existing cache arch fields) and set arch
accordingly before setting warpSize and backend, falling back to a sensible
default (not "unknown") if none is available.
---
Nitpick comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Line 747: The fenced code block containing the ASCII art is missing a language
specifier; update the opening fence from ``` to ```text (or another appropriate
language) for the ASCII art block in vllm-binary-cache.md so static analysis
recognizes it — locate the ASCII-art fenced block (the triple-backtick
surrounding "┌─────────────────────────┐") and change the opening fence to
include "text".
In `@mcv/pkg/cache/vllm.go`:
- Around line 520-523: The code currently calls detectActualGPUInfo() inside the
loop that iterates VLLMCacheMetadata, causing expensive duplicate GPU detection;
move the call so it runs once before entering the outer/inner loops (call
detectActualGPUInfo() once and store detectedBackend, detectedArch,
detectedWarpSize, detectedPTX in local variables) and then use those stored
values inside the loop(s) instead of calling detectActualGPUInfo() repeatedly;
update any references in the loop to use the precomputed variables and remove
the inner detectActualGPUInfo() invocation.
- Line 350: The regex rankDirRegex is being compiled inside a loop causing
repeated allocations; move its compilation out of the loop (e.g., declare a
package-level var or compute once before the loop) and reuse that compiled
*regexp. Ensure you replace the inline regexp.MustCompile(`^rank_\d+_\d+$`)
inside the loop with the shared variable (referencing rankDirRegex) used by the
functions/methods in vllm.go that iterate directories.
In `@mcv/pkg/preflightcheck/vllm.go`:
- Around line 119-124: The code uses fmt.Printf in the GPU-matching branch
(inside the block checking backendMatches and warpMatches where matched is set)
which breaks logging consistency; replace the fmt.Printf call that prints
"Binary cache entry matches GPU: backend=%s, warpSize=%d\n" with logging.Debugf
using the same formatted message and variables (backend, expectedWarpSize),
keeping the matched=true and break behavior unchanged and ensuring any fmt
import is removed if no longer used.
- Around line 76-80: Replace the fmt.Printf usage in the AOT cache logging loop
with the project's logging package: locate the loop that iterates over entries
(for _, entry := range entries) and change the print to use logging.<level>
(e.g., logging.Debug or logging.Infof depending on desired verbosity) and
include entry.Hash, entry.Rank and entry.FileSize in the log message; also add
the logging import to the file. Ensure the message format and log level are
consistent with other uses of logging in this package.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2876a5ef-e373-433f-a07f-fdc9efcc2653
📒 Files selected for processing (5)
mcv/docs/vllm-binary-cache.mdmcv/pkg/accelerator/devices/nvml.gomcv/pkg/cache/types.gomcv/pkg/cache/vllm.gomcv/pkg/preflightcheck/vllm.go
775ea3e to
3b60594
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
mcv/pkg/preflightcheck/vllm.go (2)
76-80: Use logging framework instead offmt.Printffor consistency.The codebase uses
logging.Infof/logging.Debugf(logrus) for output. Usingfmt.Printfbypasses log levels and formatting.♻️ Proposed fix
// Log the AOT cache entries for debugging for _, entry := range entries { - fmt.Printf("AOT compile cache: hash=%s, rank=%s, size=%d bytes\n", + logging.Debugf("AOT compile cache: hash=%s, rank=%s, size=%d bytes", entry.Hash, entry.Rank, entry.FileSize) }This requires adding the logging import if not already present.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/preflightcheck/vllm.go` around lines 76 - 80, Replace the fmt.Printf calls in the AOT cache logging loop with the project's logging package (use logging.Infof or logging.Debugf as appropriate) so log levels and formatting are consistent; update the loop that iterates over entries (the for _, entry := range entries block) to call logging.Debugf("AOT compile cache: hash=%s, rank=%s, size=%d bytes", entry.Hash, entry.Rank, entry.FileSize) and add the logging import if it's not already present.
119-124: Use logging framework here as well.Same issue as above—use
logging.Debugffor consistency.♻️ Proposed fix
if backendMatches && warpMatches { matched = true // For detailed arch compatibility, rely on Summary label check - fmt.Printf("Binary cache entry matches GPU: backend=%s, warpSize=%d\n", + logging.Debugf("Binary cache entry matches GPU: backend=%s, warpSize=%d", backend, expectedWarpSize) break }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/preflightcheck/vllm.go` around lines 119 - 124, Replace the fmt.Printf call in the vllm preflight loop (where backendMatches and warpMatches are checked and matched is set) with the logging framework: call logging.Debugf to emit the same message and values (backend, expectedWarpSize), and add or ensure the logging package is imported; update the print site in the code block that currently prints "Binary cache entry matches GPU: backend=%s, warpSize=%d" so it uses logging.Debugf for consistency with other debug output.mcv/docs/vllm-binary-cache.md (1)
747-747: Add language specifier to fenced code block.Static analysis flags this block as missing a language specifier. Since it's an ASCII diagram, use
textorplaintext.♻️ Proposed fix
-``` +```text ┌─────────────────────────┐ │ 1. Start vLLM │🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/docs/vllm-binary-cache.md` at line 747, The fenced code block containing the ASCII diagram (the block starting with triple backticks and lines like "┌─────────────────────────┐" and "│ 1. Start vLLM │") is missing a language specifier; update the opening fence from ``` to ```text (or ```plaintext) so the block becomes a text/plaintext fenced code block to satisfy static analysis.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Around line 14-16: Update the docs paragraph in mcv/docs/vllm-binary-cache.md
that claims "Not Currently Supported" for VLLM_USE_AOT_COMPILE=1 to state that
AOT compile is supported; change the wording to describe the supported
"aot_compile" CacheFormat and mention any caveats (e.g., validation rules
enforced by preflight check). Reference the existing implementation: detection
logic in mcv/pkg/cache/vllm.go (which sets CacheFormat: "aot_compile" and
detects torch_aot_compile/), the validation in mcv/pkg/preflightcheck/vllm.go
that recognizes "aot_compile", and the AOTCompileCacheMetadata type used for AOT
artifacts; update the doc to reflect support and include any limitations or
validation requirements.
In `@mcv/pkg/preflightcheck/triton.go`:
- Around line 66-69: entry.Arch is typed as any but normalizeArchForComparison
requires strings; before calling normalizeArchForComparison on entry.Arch,
perform a type assertion to string (e.g., archStr, ok := entry.Arch.(string))
and if the assertion fails convert it to a string safely (e.g.,
fmt.Sprint(entry.Arch) or set to empty) then pass that string to
normalizeArchForComparison; update the two lines using
normalizeArchForComparison(entry.Backend, entry.Arch) to use the
asserted/converted string variable (keep gpuInfo.Backend/gpuInfo.Arch as-is) so
the call signatures match.
In `@mcv/pkg/preflightcheck/vllm.go`:
- Around line 40-54: entry.TritonCacheEntries contains JSON-unmarshalled
map[string]interface{} values so the type assertion to cache.TritonCacheMetadata
will always fail; update the loop that builds convertedEntries (referencing
entry.TritonCacheEntries and cache.TritonCacheMetadata) to convert each element
by re-marshaling the element to JSON and then unmarshaling into a
cache.TritonCacheMetadata (or perform a manual map->struct conversion), handle
and return any marshal/unmarshal errors, and then call
CompareTritonEntriesToGPU(convertedEntries, devInfo) as before.
---
Nitpick comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Line 747: The fenced code block containing the ASCII diagram (the block
starting with triple backticks and lines like "┌─────────────────────────┐" and
"│ 1. Start vLLM │") is missing a language specifier; update the
opening fence from ``` to ```text (or ```plaintext) so the block becomes a
text/plaintext fenced code block to satisfy static analysis.
In `@mcv/pkg/preflightcheck/vllm.go`:
- Around line 76-80: Replace the fmt.Printf calls in the AOT cache logging loop
with the project's logging package (use logging.Infof or logging.Debugf as
appropriate) so log levels and formatting are consistent; update the loop that
iterates over entries (the for _, entry := range entries block) to call
logging.Debugf("AOT compile cache: hash=%s, rank=%s, size=%d bytes", entry.Hash,
entry.Rank, entry.FileSize) and add the logging import if it's not already
present.
- Around line 119-124: Replace the fmt.Printf call in the vllm preflight loop
(where backendMatches and warpMatches are checked and matched is set) with the
logging framework: call logging.Debugf to emit the same message and values
(backend, expectedWarpSize), and add or ensure the logging package is imported;
update the print site in the code block that currently prints "Binary cache
entry matches GPU: backend=%s, warpSize=%d" so it uses logging.Debugf for
consistency with other debug output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d50b4c4e-5004-48ec-b8f6-b315fa99ed93
📒 Files selected for processing (8)
gkm-codespell.precommit-tomlmcv/docs/vllm-binary-cache.mdmcv/pkg/accelerator/devices/nvml.gomcv/pkg/cache/types.gomcv/pkg/cache/vllm.gomcv/pkg/preflightcheck/triton.gomcv/pkg/preflightcheck/utils.gomcv/pkg/preflightcheck/vllm.go
✅ Files skipped from review due to trivial changes (3)
- gkm-codespell.precommit-toml
- mcv/pkg/accelerator/devices/nvml.go
- mcv/pkg/cache/vllm.go
🚧 Files skipped from review as they are similar to previous changes (1)
- mcv/pkg/cache/types.go
eb76dd9 to
eec4df9
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
mcv/pkg/cache/vllm.go (1)
540-542:⚠️ Potential issue | 🟡 MinorFix fallback path to update arch when GPU detection fails.
Lines 552-554 update the
backendwhen detection fails, but leavearchas the string"UnknownBackend"from failed detection. This causes malformed arch strings like"sm_UnknownBackend"(line 541) when the fallback backend resolves to CUDA.The arch should either be:
- Reset to a sensible default (e.g., "75" for CUDA), or
- Extracted from cache metadata alongside the backend
Example fix: update arch in fallback path
// Handle special cases where no GPU is detected if backend == UnknownBackend { logging.Warn("Could not detect GPU on system, using cache metadata as fallback") // Fallback to cache metadata if GPU detection failed backend = binaryCache.TargetDevice if backend == "" { backend = CUDABackend // Default if not specified } + // Reset arch to default since detection failed + arch = "75" // Safe default; consider extracting from cache if available // Set default warp sizes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` around lines 540 - 542, The fallback path that resets backend leaves arch set to the failed-detection sentinel (producing values like "sm_UnknownBackend"); update the fallback logic so that when backend is reassigned to CUDABackend you also update arch to a sensible CUDA default (e.g., "75") or load the arch from the cache metadata at the same time; modify the code handling variables backend and arch (the block that currently checks backend == CUDABackend and formats arch with "sm_%s") so arch is corrected when the fallback backend is applied.
🧹 Nitpick comments (1)
mcv/pkg/cache/vllm.go (1)
524-527: GPU detection called redundantly inside loop.
detectActualGPUInfo()is called inside the metadata loop (line 527), meaning it's invoked once perVLLMCacheMetadataentry. Since the system's GPU doesn't change during execution, this detection should be performed once before the loop.♻️ Suggested refactor to detect GPU once
func buildBinaryCacheSummary(metadata []VLLMCacheMetadata) (*Summary, error) { targetMap := make(map[string]SummaryTargetInfo) + // Detect actual GPU from the system once + // NOTE: We detect the actual system GPU rather than trusting VLLM_TARGET_DEVICE + // because caches may be copied from other systems + detectedBackend, detectedArch, detectedWarpSize, detectedPTX := detectActualGPUInfo() + for _, meta := range metadata { if meta.CacheFormat != BinaryCacheFormat { continue } - // Detect actual GPU from the system once per metadata entry - // NOTE: We detect the actual system GPU rather than trusting VLLM_TARGET_DEVICE - // because caches may be copied from other systems - detectedBackend, detectedArch, detectedWarpSize, detectedPTX := detectActualGPUInfo() - for i := range meta.BinaryCacheEntries {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` around lines 524 - 527, detectActualGPUInfo() is being called inside the per-entry metadata loop (where VLLMCacheMetadata entries are processed), causing redundant system GPU detection; move the call out of the loop so detectedBackend, detectedArch, detectedWarpSize, detectedPTX are computed once before iterating over the metadata slice/loop and reuse those values for each VLLMCacheMetadata entry. Update any references inside the loop to use the precomputed variables and remove the in-loop call to detectActualGPUInfo().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Around line 14-20: The documentation contains contradictory statements about
AOT compile support; update all passages that say `VLLM_USE_AOT_COMPILE=1` is
"not currently supported" to reflect that AOT compile is supported by MCV now
(since the PR adds detectAOTCompileCache and AOT validation in preflight
checks). Search for occurrences mentioning AOT support and consistently state
that MCV supports the AOT workflow, describe the created cache path and that
compatibility is validated using the summary label during preflight, and mention
the new detection/validation functionality by name (`detectAOTCompileCache` and
the AOT validation in preflight checks) so readers know the code path that
enforces support.
- Around line 335-348: Update the documentation example to match the actual code
output: change the example JSON `"arch": "75"` to `"arch": "sm_75"` and add a
short note that CUDA architectures are formatted with an "sm_" prefix (this
formatting comes from the arch = fmt.Sprintf("sm_%s", arch) logic in
mcv/pkg/cache/vllm.go).
In `@mcv/pkg/cache/vllm.go`:
- Around line 548-567: When backend == UnknownBackend after
detectActualGPUInfo(), also populate arch from cache metadata or env fallback
instead of leaving it as "UnknownBackend": read arch from
binaryCache.Environment["VLLM_PAGED_ATTN_ARCH"] (or
os.Getenv("VLLM_PAGED_ATTN_ARCH")) and assign it to arch if non-empty; if that
is empty, derive a sensible default from binaryCache.TargetDevice or backend and
normalize it to match the values used by CompareCacheSummaryLabelToGPU so
architecture comparisons succeed; ensure this logic sits in the same block that
sets backend and warpSize so Summary label uses the corrected arch.
---
Duplicate comments:
In `@mcv/pkg/cache/vllm.go`:
- Around line 540-542: The fallback path that resets backend leaves arch set to
the failed-detection sentinel (producing values like "sm_UnknownBackend");
update the fallback logic so that when backend is reassigned to CUDABackend you
also update arch to a sensible CUDA default (e.g., "75") or load the arch from
the cache metadata at the same time; modify the code handling variables backend
and arch (the block that currently checks backend == CUDABackend and formats
arch with "sm_%s") so arch is corrected when the fallback backend is applied.
---
Nitpick comments:
In `@mcv/pkg/cache/vllm.go`:
- Around line 524-527: detectActualGPUInfo() is being called inside the
per-entry metadata loop (where VLLMCacheMetadata entries are processed), causing
redundant system GPU detection; move the call out of the loop so
detectedBackend, detectedArch, detectedWarpSize, detectedPTX are computed once
before iterating over the metadata slice/loop and reuse those values for each
VLLMCacheMetadata entry. Update any references inside the loop to use the
precomputed variables and remove the in-loop call to detectActualGPUInfo().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d7159a6d-5287-4a2d-b977-f2c566e1aaff
📒 Files selected for processing (10)
gkm-codespell.precommit-tomlmcv/docs/vllm-binary-cache.mdmcv/pkg/accelerator/devices/amd.gomcv/pkg/accelerator/devices/nvml.gomcv/pkg/accelerator/devices/rocm.gomcv/pkg/cache/types.gomcv/pkg/cache/vllm.gomcv/pkg/preflightcheck/triton.gomcv/pkg/preflightcheck/utils.gomcv/pkg/preflightcheck/vllm.go
✅ Files skipped from review due to trivial changes (1)
- gkm-codespell.precommit-toml
🚧 Files skipped from review as they are similar to previous changes (2)
- mcv/pkg/preflightcheck/triton.go
- mcv/pkg/accelerator/devices/nvml.go
| **AOT Compile Support**: MCV **supports** the `VLLM_USE_AOT_COMPILE=1` workflow, | ||
| which creates a separate cache structure at | ||
| `torch_compile_cache/torch_aot_compile/{hash}/rank_{rank}_{dp_rank}/model`. | ||
| AOT compile caches store ahead-of-time compiled models as single binary files | ||
| rather than multiple compilation artifacts. During preflight checks, AOT cache | ||
| compatibility is validated primarily via the summary label, as the cache metadata | ||
| contains limited hardware information. |
There was a problem hiding this comment.
Documentation contains contradictory statements about AOT compile support.
Lines 14-20 state that MCV "supports the VLLM_USE_AOT_COMPILE=1 workflow", but lines 35-37 state the same workflow is "not currently supported by MCV". Similar contradictions appear at lines 111-112 and 511-513.
Please resolve this inconsistency. Based on the code changes in this PR (which add detectAOTCompileCache and AOT validation in preflight checks), AOT compile appears to be supported. Update all sections to reflect the actual support status consistently.
Also applies to: 35-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mcv/docs/vllm-binary-cache.md` around lines 14 - 20, The documentation
contains contradictory statements about AOT compile support; update all passages
that say `VLLM_USE_AOT_COMPILE=1` is "not currently supported" to reflect that
AOT compile is supported by MCV now (since the PR adds detectAOTCompileCache and
AOT validation in preflight checks). Search for occurrences mentioning AOT
support and consistently state that MCV supports the AOT workflow, describe the
created cache path and that compatibility is validated using the summary label
during preflight, and mention the new detection/validation functionality by name
(`detectAOTCompileCache` and the AOT validation in preflight checks) so readers
know the code path that enforces support.
| // Handle special cases where no GPU is detected | ||
| if backend == UnknownBackend { | ||
| logging.Warn("Could not detect GPU on system, using cache metadata as fallback") | ||
| // Fallback to cache metadata if GPU detection failed | ||
| backend = binaryCache.TargetDevice | ||
| if backend == "" { | ||
| backend = CUDABackend // Default if not specified | ||
| } | ||
| // Set default warp sizes | ||
| switch backend { | ||
| case ROCmBackend, HIPBackend: | ||
| warpSize = 64 | ||
| case CUDABackend: | ||
| warpSize = 32 | ||
| case "tpu": | ||
| warpSize = 128 | ||
| case "cpu": | ||
| warpSize = 1 | ||
| } | ||
| } |
There was a problem hiding this comment.
Fallback path still leaves arch potentially invalid.
When GPU detection fails (backend == UnknownBackend), the code falls back to cache metadata for backend and warpSize, but arch remains as the value returned from detectActualGPUInfo() - which is "UnknownBackend" on failure (line 452, 460, 467, 474, 481).
This means the Summary label will contain arch: "UnknownBackend", causing all architecture comparisons in CompareCacheSummaryLabelToGPU to fail. Consider extracting arch from cache environment (e.g., from VLLM_PAGED_ATTN_ARCH or similar env var if available) as a secondary fallback, or document this limitation clearly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mcv/pkg/cache/vllm.go` around lines 548 - 567, When backend == UnknownBackend
after detectActualGPUInfo(), also populate arch from cache metadata or env
fallback instead of leaving it as "UnknownBackend": read arch from
binaryCache.Environment["VLLM_PAGED_ATTN_ARCH"] (or
os.Getenv("VLLM_PAGED_ATTN_ARCH")) and assign it to arch if non-empty; if that
is empty, derive a sensible default from binaryCache.TargetDevice or backend and
normalize it to match the values used by CompareCacheSummaryLabelToGPU so
architecture comparisons succeed; ensure this logic sits in the same block that
sets backend and warpSize so Summary label uses the corrected arch.
This commit fixes a critical bug in vLLM cache image creation where the
GPU compute capability was incorrectly derived from the CUDA toolkit
version instead of detecting the actual GPU hardware.
## Bug Fixed
Previously, mcv would read `VLLM_MAIN_CUDA_VERSION` (e.g., "12.9") from
the cache metadata and use it as the compute capability (e.g., "sm_12.9").
This is incorrect because:
- CUDA version != compute capability
- sm_12.9 doesn't exist (valid examples: sm_75, sm_80, sm_89)
- Preflight checks would always fail due to architecture mismatch
## Changes
1. **GPU Detection**:
- Added `detectActualGPUInfo()` to detect real GPU hardware on the system
- Detects backend (CUDA/ROCm), architecture, warp size, and PTX version
- Works for both NVIDIA (CUDA) and AMD (ROCm) GPUs
- Handles cases where cache is copied from another system
2. **Summary Metadata**:
- Added `PTXVersion` and `CUDAVersion` fields to `SummaryTargetInfo`
- Captures both actual GPU arch (e.g., sm_75) and toolkit version (e.g., "12.9")
- Enables accurate compatibility checking
3. **Robust Detection**:
- Detects actual GPUs first, doesn't trust `VLLM_TARGET_DEVICE` from copied caches
- Falls back to cache metadata only if GPU detection fails
- Logs detailed information for debugging
## Documentation Updates
- Fixed Hardware Detection section to show correct architecture detection
- Added complete end-to-end workflow example:
* Starting vLLM container
* Waiting for cache generation (happens at startup, not first inference)
* Capturing cache from container
* Building OCI image with mcv
* Pushing to registry
* Extracting with preflight checks
* Verifying GPU compatibility
- Clarified that cache compilation happens during vLLM startup, not on first request
## Example
Before (incorrect):
```json
{"backend": "cuda", "arch": "sm_12.9", "warp_size": 32}
```
After (correct):
```json
{
"backend": "cuda",
"arch": "sm_75",
"warp_size": 32,
"ptx_version": 75,
"cuda_version": "12.9"
}
```
Now preflight checks correctly match Tesla T4 (sm_75) caches with Tesla T4 GPUs.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
This commit includes two related improvements to vLLM cache handling:
1. **NVIDIA Architecture Format Fix**
- Changed Arch field from numeric (e.g., '75') to proper format ('sm_75')
- Matches vLLM/PyTorch conventions for GPU architecture strings
- Ensures correct preflight check validation
2. **AOT Compile Cache Support**
- Added detection for VLLM_USE_AOT_COMPILE=1 cache format
- Implements AOTCompileCacheMetadata type
- Supports torch_aot_compile/{hash}/rank_X_Y/model structure
- Extends VLLMCacheMetadata to handle 'aot_compile' format
- Adds preflight validation for AOT caches
Changes enable proper detection and validation of:
- Binary caches (existing)
- Triton caches (existing)
- AOT compile caches (new)
Tested on Tesla T4 (sm_75) - preflight checks pass correctly.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Fixes 7 linting issues: adds constants for backend types (ROCm, HIP, Unknown), simplifies function signature, converts if-else to switch statement, and marks unused parameter. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Replaces CUDA version-based architecture detection with actual GPU hardware detection via NVML/ROCm libraries. This ensures cache images are built with the correct architecture for the actual GPU present, rather than inferring from toolkit versions. Key changes: - NVML returns numeric arch format (e.g., "75") for Triton compatibility - vLLM binary cache adds sm_ prefix for CUDA (e.g., "sm_75") in summary - Preflight checks normalize arch strings to handle both "75" and "sm_75" - Added arch normalization function for cross-cache-format compatibility This approach maintains compatibility with: - Triton CUDA caches expecting "75" - vLLM binary caches expecting "sm_75" - AMD/ROCm caches using "gfx1151" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Update vllm-binary-cache.md to reflect AOT compile support - Add text language specifier to ASCII diagram code fence - Fix TritonCacheEntries type conversion via JSON re-marshalling - Replace fmt.Printf with logging.Debugf for consistency Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
eec4df9 to
1702619
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mcv/pkg/cache/vllm.go (1)
106-116:⚠️ Potential issue | 🟠 MajorAvoid detecting the same
torch_aot_compileartifacts as both binary andaot_compile.
torch_aot_compile/{hash}/rank_X_Y/modelis first appended asBinaryCacheFormatviadetectMegaAOTEntries, then appended again as"aot_compile"viadetectAOTCompileCache. This duplicates manifest entries/counts and can break preflight on ROCm/HIP because the synthetic binary metadata has noTargetDevice, so validation defaults it to CUDA. Keep one representation for this layout—prefer the explicit"aot_compile"path added in this PR.🐛 Proposed fix direction
- // Mega-AOT layout wraps per-model hash dirs under - // torch_aot_compile/. Recurse one level and treat each - // child as a hash dir. if entry.Name() == torchAOTCompileDirName { - aotMeta, aotCount := detectMegaAOTEntries( - filepath.Join(torchCompileCachePath, entry.Name()), - ) - metadata = append(metadata, aotMeta...) - count += aotCount + // Handled below as explicit "aot_compile" metadata. continue }Also applies to: 175-202
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` around lines 106 - 116, The current loop treats the torch_aot_compile dir as Mega-AOT by calling detectMegaAOTEntries (leading to BinaryCacheFormat entries) and later also detects it as "aot_compile" via detectAOTCompileCache, causing duplicate manifest entries; change the logic so that when entry.Name() == torchAOTCompileDirName you do NOT call detectMegaAOTEntries and instead skip/continue so only detectAOTCompileCache produces entries. Update the conditional in the loop (the block referencing torchAOTCompileDirName, detectMegaAOTEntries, metadata and count) to avoid appending aot artifacts as BinaryCacheFormat and rely exclusively on detectAOTCompileCache for that layout (keep torchAOTCompileDirName, detectAOTCompileCache, metadata, and count references to locate the code).
♻️ Duplicate comments (3)
mcv/docs/vllm-binary-cache.md (2)
14-20:⚠️ Potential issue | 🟠 MajorMake the AOT compile support status and cache format consistent.
The doc says MCV supports
VLLM_USE_AOT_COMPILE=1at Lines 14-20, but later says it is unsupported at Lines 35-46, 107-112, and 511-513. The format table also maps AOT to"binary", while the implementation now emits/validatescacheFormat: "aot_compile". Please update these sections to consistently describeVLLM_USE_AOT_COMPILE=1as supported with the explicit"aot_compile"manifest format, and keep “Mega AOT” separate from AOT compile.📝 Proposed wording direction
-**Note**: The `VLLM_USE_AOT_COMPILE=1` workflow uses a different structure at -`torch_compile_cache/torch_aot_compile/{hash}/rank_{rank}_{dp_rank}/model` and is -not currently supported by MCV. +**Note**: The `VLLM_USE_AOT_COMPILE=1` workflow uses a different structure at +`torch_compile_cache/torch_aot_compile/{hash}/rank_{rank}_{dp_rank}/model`. +MCV supports this layout as `cacheFormat: "aot_compile"`; compatibility is +validated primarily through the image summary label because the AOT metadata +contains limited hardware details.-| AOT | Binary files | `"binary"` | `"binary"` | `"binary"` | +| AOT compile | Single `model` file | `"aot_compile"` | N/A | `"binary"` |Also applies to: 35-46, 107-112, 477-483, 511-513
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/docs/vllm-binary-cache.md` around lines 14 - 20, Update the documentation so AOT compile (VLLM_USE_AOT_COMPILE=1) is consistently described as supported: change the "AOT Compile Support" paragraph to state it is supported, replace any claims that it is unsupported with support wording, and ensure the manifest/format table and any references use cacheFormat: "aot_compile" instead of "binary"; keep the separate concept/name "Mega AOT" distinct from AOT compile in all sections (including the format mapping and compatibility notes) and ensure the preflight/cache validation text references the summary label and limited hardware metadata for the "aot_compile" format.
332-348:⚠️ Potential issue | 🟡 MinorUpdate CUDA summary examples to use the
sm_architecture format.The docs show summary labels with
"arch": "75", butbuildBinaryCacheSummaryprefixes CUDA architecture assm_, so the emitted summary is expected to look like"arch": "sm_75". Please align the examples and the explanatory text with the actual label format.📝 Proposed documentation fix
{ "backend": "cuda", - "arch": "75", + "arch": "sm_75", "warp_size": 32, "ptx_version": 590, "cuda_version": "12.9" }-**Important**: Notice that the `arch` field shows the **actual GPU compute capability** (e.g., `75` for Tesla T4 which is sm_7.5), not the CUDA toolkit version. +**Important**: Notice that the `arch` field shows the **actual GPU compute capability** in CUDA `sm_` format (e.g., `sm_75` for Tesla T4 / sm_7.5), not the CUDA toolkit version.Also applies to: 642-651
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/docs/vllm-binary-cache.md` around lines 332 - 348, Update the CUDA architecture examples and explanatory text to use the "sm_" prefix that buildBinaryCacheSummary produces (e.g., "arch": "sm_75" instead of "75"); change the JSON example and all descriptive mentions (including the sentence about compute capability and any other examples around the same section) to the sm_ format, and ensure references to environment variables remain correct (VLLM_TARGET_DEVICE for backend, VLLM_MAIN_CUDA_VERSION for CUDA version) so labels emitted by buildBinaryCacheSummary match the documentation.mcv/pkg/cache/vllm.go (1)
636-655:⚠️ Potential issue | 🟠 MajorPopulate
archwhen falling back to cache metadata.When GPU detection fails,
backendandwarpSizeare replaced from cache metadata/defaults, butarchremainsUnknownBackend. That produces a summary target likebackend=cuda, arch=UnknownBackend, which will fail architecture compatibility checks later. Add an environment-derived architecture fallback or omit the target when architecture cannot be determined.🐛 Proposed fix direction
if backend == UnknownBackend { logging.Warn("Could not detect GPU on system, using cache metadata as fallback") // Fallback to cache metadata if GPU detection failed backend = binaryCache.TargetDevice if backend == "" { backend = CUDABackend // Default if not specified } + if envArch, ok := binaryCache.Env["VLLM_PAGED_ATTN_ARCH"].(string); ok && envArch != "" { + arch = envArch + } + if backend == CUDABackend && arch != "" && !strings.HasPrefix(arch, "sm_") { + arch = fmt.Sprintf("sm_%s", arch) + } // Set default warp sizes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcv/pkg/cache/vllm.go` around lines 636 - 655, When falling back because backend == UnknownBackend, the code updates backend and warpSize but leaves arch as UnknownBackend which breaks later compatibility checks; update the fallback block in vllm.go (the section handling UnknownBackend) to also populate arch from cache or environment (e.g., use binaryCache.TargetArch or an env var) or clear/omit arch when it cannot be determined, and ensure any defaulting logic (mapping backend->arch) is applied so the resulting target does not contain arch==UnknownBackend; change references around backend, arch, warpSize, UnknownBackend, binaryCache.TargetDevice and CUDABackend accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mcv/pkg/accelerator/devices/nvml.go`:
- Around line 69-79: SetTritonInfo currently replaces d.tritonInfo but only
merges entries into the existing d.devices map, leaving stale GPUs visible to
GetGPUInfo/GetAllGPUInfo; fix by reinitializing d.devices to an empty map before
the loop in SetTritonInfo (e.g., set d.devices = make(map[int]GPUDevice)) and
then populate it from the provided info slice so the map exactly reflects the
new tritonInfo entries (retain the existing per-device field assignments such as
TritonInfo and leave Summary to be set by SetSummaries).
---
Outside diff comments:
In `@mcv/pkg/cache/vllm.go`:
- Around line 106-116: The current loop treats the torch_aot_compile dir as
Mega-AOT by calling detectMegaAOTEntries (leading to BinaryCacheFormat entries)
and later also detects it as "aot_compile" via detectAOTCompileCache, causing
duplicate manifest entries; change the logic so that when entry.Name() ==
torchAOTCompileDirName you do NOT call detectMegaAOTEntries and instead
skip/continue so only detectAOTCompileCache produces entries. Update the
conditional in the loop (the block referencing torchAOTCompileDirName,
detectMegaAOTEntries, metadata and count) to avoid appending aot artifacts as
BinaryCacheFormat and rely exclusively on detectAOTCompileCache for that layout
(keep torchAOTCompileDirName, detectAOTCompileCache, metadata, and count
references to locate the code).
---
Duplicate comments:
In `@mcv/docs/vllm-binary-cache.md`:
- Around line 14-20: Update the documentation so AOT compile
(VLLM_USE_AOT_COMPILE=1) is consistently described as supported: change the "AOT
Compile Support" paragraph to state it is supported, replace any claims that it
is unsupported with support wording, and ensure the manifest/format table and
any references use cacheFormat: "aot_compile" instead of "binary"; keep the
separate concept/name "Mega AOT" distinct from AOT compile in all sections
(including the format mapping and compatibility notes) and ensure the
preflight/cache validation text references the summary label and limited
hardware metadata for the "aot_compile" format.
- Around line 332-348: Update the CUDA architecture examples and explanatory
text to use the "sm_" prefix that buildBinaryCacheSummary produces (e.g.,
"arch": "sm_75" instead of "75"); change the JSON example and all descriptive
mentions (including the sentence about compute capability and any other examples
around the same section) to the sm_ format, and ensure references to environment
variables remain correct (VLLM_TARGET_DEVICE for backend, VLLM_MAIN_CUDA_VERSION
for CUDA version) so labels emitted by buildBinaryCacheSummary match the
documentation.
In `@mcv/pkg/cache/vllm.go`:
- Around line 636-655: When falling back because backend == UnknownBackend, the
code updates backend and warpSize but leaves arch as UnknownBackend which breaks
later compatibility checks; update the fallback block in vllm.go (the section
handling UnknownBackend) to also populate arch from cache or environment (e.g.,
use binaryCache.TargetArch or an env var) or clear/omit arch when it cannot be
determined, and ensure any defaulting logic (mapping backend->arch) is applied
so the resulting target does not contain arch==UnknownBackend; change references
around backend, arch, warpSize, UnknownBackend, binaryCache.TargetDevice and
CUDABackend accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 6b53339b-63dd-4b5b-99a8-f180789f5388
📒 Files selected for processing (10)
gkm-codespell.precommit-tomlmcv/docs/vllm-binary-cache.mdmcv/pkg/accelerator/devices/amd.gomcv/pkg/accelerator/devices/nvml.gomcv/pkg/accelerator/devices/rocm.gomcv/pkg/cache/types.gomcv/pkg/cache/vllm.gomcv/pkg/preflightcheck/triton.gomcv/pkg/preflightcheck/utils.gomcv/pkg/preflightcheck/vllm.go
✅ Files skipped from review due to trivial changes (1)
- gkm-codespell.precommit-toml
🚧 Files skipped from review as they are similar to previous changes (5)
- mcv/pkg/accelerator/devices/amd.go
- mcv/pkg/accelerator/devices/rocm.go
- mcv/pkg/preflightcheck/triton.go
- mcv/pkg/preflightcheck/utils.go
- mcv/pkg/cache/types.go
| // Rebuild devices map from cached triton info | ||
| if d.devices == nil { | ||
| d.devices = make(map[int]GPUDevice) | ||
| } | ||
| for _, tritonInfo := range info { | ||
| d.devices[tritonInfo.ID] = GPUDevice{ | ||
| ID: tritonInfo.ID, | ||
| TritonInfo: tritonInfo, | ||
| // Summary will be set by SetSummaries | ||
| } | ||
| } |
There was a problem hiding this comment.
Rebuild devices instead of merging into the existing map.
SetTritonInfo replaces d.tritonInfo, but only merges into d.devices. If this setter runs after a previous probe/cache load, stale GPU entries remain visible through GetGPUInfo/GetAllGPUInfo.
Suggested fix
// Rebuild devices map from cached triton info
- if d.devices == nil {
- d.devices = make(map[int]GPUDevice)
- }
+ d.devices = make(map[int]GPUDevice, len(info))
for _, tritonInfo := range info {
d.devices[tritonInfo.ID] = GPUDevice{
ID: tritonInfo.ID,
TritonInfo: tritonInfo,
// Summary will be set by SetSummaries
}
}
+ if len(d.summaries) > 0 {
+ d.SetSummaries(d.summaries)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Rebuild devices map from cached triton info | |
| if d.devices == nil { | |
| d.devices = make(map[int]GPUDevice) | |
| } | |
| for _, tritonInfo := range info { | |
| d.devices[tritonInfo.ID] = GPUDevice{ | |
| ID: tritonInfo.ID, | |
| TritonInfo: tritonInfo, | |
| // Summary will be set by SetSummaries | |
| } | |
| } | |
| // Rebuild devices map from cached triton info | |
| d.devices = make(map[int]GPUDevice, len(info)) | |
| for _, tritonInfo := range info { | |
| d.devices[tritonInfo.ID] = GPUDevice{ | |
| ID: tritonInfo.ID, | |
| TritonInfo: tritonInfo, | |
| // Summary will be set by SetSummaries | |
| } | |
| } | |
| if len(d.summaries) > 0 { | |
| d.SetSummaries(d.summaries) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mcv/pkg/accelerator/devices/nvml.go` around lines 69 - 79, SetTritonInfo
currently replaces d.tritonInfo but only merges entries into the existing
d.devices map, leaving stale GPUs visible to GetGPUInfo/GetAllGPUInfo; fix by
reinitializing d.devices to an empty map before the loop in SetTritonInfo (e.g.,
set d.devices = make(map[int]GPUDevice)) and then populate it from the provided
info slice so the map exactly reflects the new tritonInfo entries (retain the
existing per-device field assignments such as TritonInfo and leave Summary to be
set by SetSummaries).
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores