-
Notifications
You must be signed in to change notification settings - Fork 715
test: Add KV transfer cancellation test on TRT-LLM #4547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
db0f1e8 to
6ac1065
Compare
Signed-off-by: Jacky <[email protected]>
6ac1065 to
b34f4d9
Compare
WalkthroughA new end-to-end test was added to validate request cancellation behavior during the KV transfer phase in the TensorRT-LLM workflow. The test orchestrates prefill and decode workers, triggers a cancellable request, cancels during KV transfer, and verifies proper logging, cleanup messages, and continued worker functionality. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
443-455: Good defensive verification of worker health.Verifying that workers remain functional after KV transfer cancellation is a good practice, especially given the complexity of this cancellation scenario. The test appropriately confirms the decode worker can handle subsequent requests.
Optionally, consider also verifying the prefill worker remains functional by checking for its "Prefill Request ID" log:
# Verify prefill worker is also functional _, prefill_log_offset = poll_for_pattern( process=prefill_worker, pattern="Prefill Request ID: ", log_offset=prefill_log_offset, match_type="contains", )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/fault_tolerance/cancellation/test_trtllm.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
📚 Learning: 2025-09-25T00:49:16.914Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
🪛 Ruff (0.14.5)
tests/fault_tolerance/cancellation/test_trtllm.py
372-372: Unused function argument: runtime_services
(ARG001)
372-372: Unused function argument: predownload_models
(ARG001)
407-407: Unpacked variable prefill_log_offset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
434-434: Unpacked variable frontend_log_offset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: vllm (amd64)
🔇 Additional comments (2)
tests/fault_tolerance/cancellation/test_trtllm.py (2)
367-379: LGTM!The test signature, markers, and docstring appropriately describe the test's purpose of verifying cancellation during the KV transfer phase.
381-394: LGTM!The test setup follows the established pattern used in other disaggregated cancellation tests.
…ncel window Signed-off-by: Jacky <[email protected]>
b34f4d9 to
3ba9cff
Compare
| log_offset=decode_log_offset, | ||
| ) | ||
|
|
||
| # Verify frontend log has kill message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any log we can find from the prefill worker to indicate the transfer has stopped / broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I found the transfer always succeeded.
which makes sense because the cancellation signal propagates into the TRT-LLM engine, but we wait until the engine gracefully exits the generate loop before returning from the request, so the engine can choose to finish receiving kv cache and then exit the request.
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but want to understand if we confirm something in the prefill worker log -
Overview:
Add KV transfer cancellation test on TRT-LLM
Details:
Expected behavior:
Where should the reviewer start?
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
resolves #4178
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.