Skip to content

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Nov 22, 2025

Overview:

Add KV transfer cancellation test on TRT-LLM

Details:

  1. Cancel a request during KV transfer (50% - 80% success rate).
  2. Send a new request verifying the system is functional.

Expected behavior:

[PYTHON3] [TensorRT-LLM][DEBUG][0] Start sending KV cache for request ID: 2048.
...
[PYTHON3] [TensorRT-LLM][DEBUG][0] Start receiving KV cache for request ID: 2048, context request ID: 2048.
...
[PYTHON3] 2025-11-20T20:27:07.103415Z DEBUG handler_base._handle_cancellation: Aborted Request ID: 22bddae2-e335-412c-8902-d13c6b7b133c
...
[PYTHON3] [TensorRT-LLM][DEBUG][0] End receiving KV cache for request ID: 2048, context request ID: 2048.
...
[PYTHON3] [TensorRT-LLM][DEBUG] Request 2048 finished by cancel

Where should the reviewer start?

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

resolves #4178

Summary by CodeRabbit

  • Tests
    • Added comprehensive test coverage for request cancellation during the KV transfer phase in TensorRT-LLM workflows, ensuring proper resource cleanup, worker stability, and successful handling of subsequent requests.

✏️ Tip: You can customize this high-level summary in your review settings.

@kthui kthui self-assigned this Nov 22, 2025
@github-actions github-actions bot added the test label Nov 22, 2025
@kthui kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm branch from db0f1e8 to 6ac1065 Compare November 24, 2025 23:32
@kthui kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm branch from 6ac1065 to b34f4d9 Compare November 25, 2025 22:07
@kthui kthui marked this pull request as ready for review November 25, 2025 22:57
@kthui kthui requested review from a team as code owners November 25, 2025 22:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

A new end-to-end test was added to validate request cancellation behavior during the KV transfer phase in the TensorRT-LLM workflow. The test orchestrates prefill and decode workers, triggers a cancellable request, cancels during KV transfer, and verifies proper logging, cleanup messages, and continued worker functionality.

Changes

Cohort / File(s) Summary
TensorRT-LLM Cancellation Tests
tests/fault_tolerance/cancellation/test_trtllm.py
Added test_request_cancellation_trtllm_kv_transfer_cancel() function to test cancellation during the KV transfer phase between prefill and decode. Validates abort logging, kill message emission, and worker continuation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Test follows established patterns from existing cancellation test scenarios
  • Single file addition with focused scope
  • Logic centers on orchestrating worker processes and validating cancellation side effects
  • Primary review concern: verify cancellation mechanics during KV transfer phase are correctly exercised and that assertions properly validate expected outcomes

Poem

🐰✨ A curious path through transfer's dance,
Where keys and values prance and advance!
Cancel mid-stride, watch workers survive—
Our test hops through chaos and keeps systems alive!
thump-thump 🎉

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a KV transfer cancellation test for TRT-LLM.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description follows the required template structure with Overview, Details, Where to start, and Related Issues sections, though some sections are minimal.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)

443-455: Good defensive verification of worker health.

Verifying that workers remain functional after KV transfer cancellation is a good practice, especially given the complexity of this cancellation scenario. The test appropriately confirms the decode worker can handle subsequent requests.

Optionally, consider also verifying the prefill worker remains functional by checking for its "Prefill Request ID" log:

# Verify prefill worker is also functional
_, prefill_log_offset = poll_for_pattern(
    process=prefill_worker,
    pattern="Prefill Request ID: ",
    log_offset=prefill_log_offset,
    match_type="contains",
)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a38479 and b34f4d9.

📒 Files selected for processing (1)
  • tests/fault_tolerance/cancellation/test_trtllm.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
📚 Learning: 2025-09-25T00:49:16.914Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.

Applied to files:

  • tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Applied to files:

  • tests/fault_tolerance/cancellation/test_trtllm.py
🪛 Ruff (0.14.5)
tests/fault_tolerance/cancellation/test_trtllm.py

372-372: Unused function argument: runtime_services

(ARG001)


372-372: Unused function argument: predownload_models

(ARG001)


407-407: Unpacked variable prefill_log_offset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


434-434: Unpacked variable frontend_log_offset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: vllm (amd64)
🔇 Additional comments (2)
tests/fault_tolerance/cancellation/test_trtllm.py (2)

367-379: LGTM!

The test signature, markers, and docstring appropriately describe the test's purpose of verifying cancellation during the KV transfer phase.


381-394: LGTM!

The test setup follows the established pattern used in other disaggregated cancellation tests.

log_offset=decode_log_offset,
)

# Verify frontend log has kill message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any log we can find from the prefill worker to indicate the transfer has stopped / broken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I found the transfer always succeeded.

which makes sense because the cancellation signal propagates into the TRT-LLM engine, but we wait until the engine gracefully exits the generate loop before returning from the request, so the engine can choose to finish receiving kv cache and then exit the request.

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but want to understand if we confirm something in the prefill worker log -

@kthui kthui enabled auto-merge (squash) November 26, 2025 01:52
@kthui kthui merged commit bbaab9f into main Nov 26, 2025
33 of 34 checks passed
@kthui kthui deleted the jacky-ft-cancel-kv-transfer-trtllm branch November 26, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Unable to continue request processing after cancellation when using dynamo.trtllm with decode_first

3 participants