fix: pin cache store workers to local device#1116
Conversation
AI Code Review - PR #1116Status: LGTM Summary: P0/0 · P1/0 · P2/2 · P3/0 lgtm ready to ci Non-blocking SuggestionsP2
Checklist Violations (6 fail / 34 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
AI Code Review - PR #1116Status: BLOCKING Summary: P0/0 · P1/1 · P2/1 · P3/0 Blocking IssuesP1
Non-blocking SuggestionsP2
Checklist Violations (3 fail / 46 total)General Principles Checklist
Strengths
|
AI Code Review - PR #1116Status: BLOCKING Summary: P0/0 · P1/1 · P2/1 · P3/0 Blocking IssuesP1
Non-blocking SuggestionsP2
Checklist Violations (6 fail / 38 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
cab283e to
5b08374
Compare
AI Code Review - PR #1116Status: LGTM Summary: P0/0 · P1/0 · P2/1 · P3/0 lgtm ready to ci Non-blocking SuggestionsP2
Checklist Violations (3 fail / 82 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
|
internal source has been updated, please review the changes! |
1 similar comment
|
internal source has been updated, please review the changes! |
AI Code Review - PR #1116Status: LGTM Summary: P0/0 · P1/0 · P2/0 · P3/2 lgtm ready to ci Non-blocking SuggestionsP3
Checklist ✅ (39 items passed)Strengths
|
|
internal source has been updated, please review the changes! |
1 similar comment
|
internal source has been updated, please review the changes! |
AI Code Review - PR #1116Status: LGTM Summary: P0/0 · P1/0 · P2/1 · P3/1 lgtm ready to ci Non-blocking SuggestionsP2
P3
Checklist Violations (1 fail / 55 total)General Principles Checklist
Strengths
|
|
internal source has been updated, please review the changes! |
Summary
Fix cache store background workers to run on the correct local GPU device.
Cache store work can run from background thread pools and RPC callbacks. These threads do not always inherit the expected CUDA/HIP current device, which can lead to cache store operations using the wrong device in multi-GPU deployments. This PR passes the local rank device id into cache store components and pins worker threads before they execute device-sensitive work.
Changes
pinThreadToDeviceOnce()utility for CUDA/ROCm device pinning.local_rankinto cache store init params and async cache writer.Testing
Not run locally.