-
Couldn't load subscription status.
- Fork 663
fix: Sglang Cancellation Aggregated Test #3896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi vyalamar! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe changes implement a two-phase cancellation mechanism with a configurable grace period for SGLang backend requests. Updates include adding sleep delays for cleanup in handlers, distinguishing backend types in disconnect logic, introducing per-request cancellation state tracking, and updating related tests to accommodate the grace period timing. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/src/dynamo/sglang/request_handlers/handler_base.py (2)
151-157: Wrap abort_request call to avoid propagating exceptionsAbort can fail (e.g., request already finished). Catch and log to prevent noisy tracebacks during cancellation.
Apply:
- self.engine.tokenizer_manager.abort_request( - rid=sglang_request_id, abort_all=False - ) - logging.info(f"Aborted Request ID: {context.id()}") + try: + self.engine.tokenizer_manager.abort_request( + rid=sglang_request_id, abort_all=False + ) + logging.info(f"Aborted Request ID: {sglang_request_id} (Context: {context.id()})") + except Exception as e: + logging.warning( + f"abort_request failed for SGLang Request ID {sglang_request_id} (Context: {context.id()}): {e}" + )
135-139: Incorrect log message: future wasn’t cancelledThe future just resolved; message should say “resolved/ready,” not “cancelled.”
Apply:
- logging.debug(f"Request ID future cancelled for Context: {context.id()}") + logging.debug(f"Request ID future resolved for Context: {context.id()}")
🧹 Nitpick comments (5)
tests/fault_tolerance/cancellation/test_sglang.py (2)
194-197: Long prompt toggle is appropriate to surface races; optionally gate sizeConsider parametrizing prompt length or gating via env (e.g., CANCELLATION_STRESS=1) to keep CI time predictable on constrained runners.
223-229: Derive polling timeout from CANCEL_GRACE_MS instead of hardcoding 5000msKeeps tests aligned with configured grace period while leaving headroom.
Apply:
- _, worker_log_offset = poll_for_pattern( + grace_ms = int(os.getenv("CANCEL_GRACE_MS", "300")) + # 10x grace + 1s headroom + _, worker_log_offset = poll_for_pattern( process=worker, pattern=f"Aborted Request ID: {request_id}", log_offset=worker_log_offset, - max_wait_ms=5000, # Increased from 2000ms to 5000ms to account for grace period + max_wait_ms=grace_ms * 10 + 1000, )lib/llm/src/http/service/disconnect.rs (1)
269-287: Clamp and log configured grace to avoid pathological valuesProtect against extreme env values and aid debugging.
Apply:
-fn cancel_grace_ms() -> u64 { - std::env::var("CANCEL_GRACE_MS") - .ok() - .and_then(|v| v.parse().ok()) - .unwrap_or(300) -} +fn cancel_grace_ms() -> u64 { + let v = std::env::var("CANCEL_GRACE_MS") + .ok() + .and_then(|v| v.parse::<u64>().ok()) + .unwrap_or(300); + v.clamp(0, 10_000) +}Optionally emit a debug once at startup with the chosen value.
test_cancellation_isolated.py (2)
19-30: Reduce timing flakiness; use perf_counter and drop tight upper boundCI jitter can exceed 400ms. Base on env grace and check only lower bound.
Apply:
- start_time = time.time() - await asyncio.sleep(grace_period_ms / 1000.0) # Our implementation - end_time = time.time() + # Prefer monotonic clock for short intervals + start_time = time.perf_counter() + await asyncio.sleep(grace_period_ms / 1000.0) # Our implementation + end_time = time.perf_counter() @@ - assert elapsed_ms >= 300, f"Grace period too short: {elapsed_ms}ms" - assert elapsed_ms <= 400, f"Grace period too long: {elapsed_ms}ms" + assert elapsed_ms >= grace_period_ms, f"Grace period too short: {elapsed_ms:.1f}ms"Optionally read from CANCEL_GRACE_MS and assert against that value.
128-131: Avoid blind Exception catch and align with ruff TRY300Either let exceptions bubble (preferred for tests) or use try/except…else for clarity.
Apply:
- except Exception as e: - print(f"❌ Test failed: {e}") - return False + except Exception as e: + print(f"❌ Test failed: {e}") + return False + else: + return TrueOr remove the try/except entirely and let failures surface to CI.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/src/dynamo/sglang/request_handlers/handler_base.py(1 hunks)lib/llm/src/http/service/disconnect.rs(4 hunks)test_cancellation_isolated.py(1 hunks)tests/fault_tolerance/cancellation/test_sglang.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/fault_tolerance/cancellation/test_sglang.py (1)
tests/fault_tolerance/cancellation/utils.py (2)
send_cancellable_request(239-263)poll_for_pattern(319-391)
lib/llm/src/http/service/disconnect.rs (1)
lib/runtime/src/pipeline/context.rs (8)
id(59-61)id(252-254)id(332-334)id(347-349)stopped(276-278)stopped(359-366)kill(260-262)kill(409-423)
🪛 GitHub Actions: Copyright Checks
test_cancellation_isolated.py
[error] 1-1: Copyright check failed. Missing/invalid header detected in test_cancellation_isolated.py.
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3896/merge) by vyalamar.
lib/llm/src/http/service/disconnect.rs
[warning] 1-1: Trailing whitespace fixed by pre-commit (ruff detected earlier; changes committed).
components/src/dynamo/sglang/request_handlers/handler_base.py
[warning] 1-1: Black formatting applied by pre-commit (reformatted file).
test_cancellation_isolated.py
[error] 1-1: Pre-commit check-shebang-scripts-are-executable failed: test_cancellation_isolated.py has a shebang but is not marked executable. Run 'chmod +x test_cancellation_isolated.py'.
[warning] 1-1: Trailing whitespace fixed by pre-commit (from test file as part of changes).
🪛 Ruff (0.14.1)
test_cancellation_isolated.py
1-1: Shebang is present but file is not executable
(EXE001)
126-126: Consider moving this statement to an else block
(TRY300)
128-128: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build and Test - dynamo
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: clippy (lib/runtime/examples)
🔇 Additional comments (4)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
133-141: Early-cancel corner case checkIf cancellation arrives before SGLang request ID is set, this task waits until exit. Confirm Rust two‑phase cancel alone fully cleans up in that path to avoid dangling work/allocations. If not, consider racing on request_id_future vs. cancel signal and skipping abort when ID missing.
tests/fault_tolerance/cancellation/test_sglang.py (1)
164-166: Docstring update aligns with two‑phase cancellationLooks good.
lib/llm/src/http/service/disconnect.rs (2)
203-216: Single cancel path reused for connection and streamGood reuse; avoids double‑cancelling.
134-201: Trait API verified; no issues found.The
AsyncEngineContexttrait properly exposes bothstop_generating()(sync) andstopped()(async) methods. All implementations—StreamContext, Controller, HttpRequestContext, and test mocks—provide working versions. The code in disconnect.rs correctly invokes both methods:stop_generating()in Phase 1 andstopped()in the tokio::select! block for Phase 2. The idempotency latch prevents duplicate cancellations, and the two-phase flow is sound.
|
|
||
| # Add grace period to allow SGLang to process the cancellation gracefully | ||
| # This prevents the race condition where Rust runtime drops the stream | ||
| # before SGLang can properly clean up the request | ||
| grace_period_ms = 300 # 300ms recommended by project leaders for reliable cancellation | ||
| logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...") | ||
| await asyncio.sleep(grace_period_ms / 1000.0) | ||
| logging.debug(f"Grace period completed for Request ID: {context.id()}") | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use the same env‑configurable grace as Rust (CANCEL_GRACE_MS) instead of hardcoding 300ms
Keeps behavior consistent and tunable across components.
Apply:
+import os
@@
- grace_period_ms = 300 # 300ms recommended by project leaders for reliable cancellation
- logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...")
- await asyncio.sleep(grace_period_ms / 1000.0)
- logging.debug(f"Grace period completed for Request ID: {context.id()}")
+ grace_period_ms_str = os.getenv("CANCEL_GRACE_MS", "300")
+ try:
+ grace_period_ms = max(0, min(int(grace_period_ms_str), 10000))
+ except ValueError:
+ grace_period_ms = 300
+ logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...")
+ await asyncio.sleep(grace_period_ms / 1000.0)
+ logging.debug(
+ f"Grace period completed for SGLang Request ID {sglang_request_id}, Context: {context.id()}"
+ )Committable suggestion skipped: line range outside the PR's diff.
| #!/usr/bin/env python3 | ||
| """ | ||
| Isolated test for cancellation grace period fix. | ||
| This test doesn't import SGLang dependencies to avoid platform compatibility issues. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipeline blocker: add SPDX header and remove shebang or make file executable
Pre‑commit failed due to missing/invalid header and non‑executable shebang. Easiest: add header and remove shebang.
Apply:
-#!/usr/bin/env python3
-"""
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+"""
Isolated test for cancellation grace period fix.
This test doesn't import SGLang dependencies to avoid platform compatibility issues.
"""Alternatively, keep shebang and set executable bit in git; but header is still required.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/usr/bin/env python3 | |
| """ | |
| Isolated test for cancellation grace period fix. | |
| This test doesn't import SGLang dependencies to avoid platform compatibility issues. | |
| """ | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| """ | |
| Isolated test for cancellation grace period fix. | |
| This test doesn't import SGLang dependencies to avoid platform compatibility issues. | |
| """ |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Copyright check failed. Missing/invalid header detected in test_cancellation_isolated.py.
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3896/merge) by vyalamar.
[error] 1-1: Pre-commit check-shebang-scripts-are-executable failed: test_cancellation_isolated.py has a shebang but is not marked executable. Run 'chmod +x test_cancellation_isolated.py'.
[warning] 1-1: Trailing whitespace fixed by pre-commit (from test file as part of changes).
🪛 Ruff (0.14.1)
1-1: Shebang is present but file is not executable
(EXE001)
🤖 Prompt for AI Agents
In test_cancellation_isolated.py around lines 1-5, the file is missing the
required SPDX license header and contains a shebang which fails
non-executable/pre-commit checks; add the appropriate SPDX header as the first
non-blank line (e.g. SPDX-License-Identifier: <LICENSE>) and remove the shebang
line, or if you must keep the shebang make the file executable in git (git
update-index --chmod=+x test_cancellation_isolated.py) while still adding the
SPDX header.
Overview:
This PR fixes the flaky
test_request_cancellation_sglang_aggregatedtest by implementing a two-phase cancellation mechanism specifically for SGLang backends. The issue was a race condition where the Rust runtime was aggressively dropping SGLang stream handlers before SGLang could gracefully clean up its resources, leading to intermittent test failures.The fix implements a configurable 300ms grace period for SGLang backends only, allowing SGLang sufficient time to process cancellation signals and clean up resources gracefully, while maintaining immediate cancellation for other backends (vLLM, TensorRT-LLM).
Details:
Key Changes:
Two-Phase Cancellation for SGLang: Uses
tokio::select!to wait for either graceful termination or timeoutengine_context.stop_generating()engine_context.stopped()or force kill on timeoutPer-Request Idempotency
Clean Backend Detection: Uses
engine_context.id().starts_with("sglang:")pattern following existing engine type system conventionsConfigurable Grace Period: Environment variable
CANCEL_GRACE_MS(default: 300ms) allows tuning without code changesWhere should the reviewer start?
Primary Focus:
lib/llm/src/http/service/disconnect.rs- Core two-phase cancellation implementationRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Bug Fixes
Tests