Skip to content

Conversation

@vyalamar
Copy link

@vyalamar vyalamar commented Oct 26, 2025

Overview:

This PR fixes the flaky test_request_cancellation_sglang_aggregated test by implementing a two-phase cancellation mechanism specifically for SGLang backends. The issue was a race condition where the Rust runtime was aggressively dropping SGLang stream handlers before SGLang could gracefully clean up its resources, leading to intermittent test failures.

The fix implements a configurable 300ms grace period for SGLang backends only, allowing SGLang sufficient time to process cancellation signals and clean up resources gracefully, while maintaining immediate cancellation for other backends (vLLM, TensorRT-LLM).

Details:

Key Changes:

  1. Two-Phase Cancellation for SGLang: Uses tokio::select! to wait for either graceful termination or timeout

    • Phase 1: Send cancel signal via engine_context.stop_generating()
    • Phase 2: Wait up to 300ms for engine_context.stopped() or force kill on timeout
  2. Per-Request Idempotency

  3. Clean Backend Detection: Uses engine_context.id().starts_with("sglang:") pattern following existing engine type system conventions

  4. Configurable Grace Period: Environment variable CANCEL_GRACE_MS (default: 300ms) allows tuning without code changes

Where should the reviewer start?

Primary Focus:

  • lib/llm/src/http/service/disconnect.rs - Core two-phase cancellation implementation

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Bug Fixes

    • Improved request cancellation with graceful cleanup phase to ensure proper resource management before termination
    • Enhanced handling of unexpected client disconnections with optimized cancellation strategies
  • Tests

    • Added comprehensive test coverage for request cancellation grace period functionality

@vyalamar vyalamar requested review from a team as code owners October 26, 2025 10:09
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi vyalamar! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor fix labels Oct 26, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 26, 2025

Walkthrough

The changes implement a two-phase cancellation mechanism with a configurable grace period for SGLang backend requests. Updates include adding sleep delays for cleanup in handlers, distinguishing backend types in disconnect logic, introducing per-request cancellation state tracking, and updating related tests to accommodate the grace period timing.

Changes

Cohort / File(s) Summary
SGLang Handler Grace Period
components/src/dynamo/sglang/request_handlers/handler_base.py
Adds a 300ms asyncio.sleep with debug logging after issuing abort_request to enable graceful cleanup during cancellation.
Rust Disconnect Service Refactor
lib/llm/src/http/service/disconnect.rs
Introduces per-request cancellation state and per-backend logic: two-phase cancellation for SGLang (stop_generating + wait for stopped within grace period), immediate kill for others. Adds metrics logging, environment-configurable grace period helpers, and backend detection by ID prefix.
New Isolated Cancellation Tests
test_cancellation_isolated.py
Adds three async test functions validating grace period timing (300–400ms), cancellation flow with mocked components, and request ID future patterns, plus a main orchestrator. Uses standard library mocks and AsyncMock.
Existing Test Updates
tests/fault_tolerance/cancellation/test_sglang.py
Removes xfail marker from test_request_cancellation_sglang_aggregated, updates docstring to reference grace period, changes request dispatch to use_long_prompt=True, increases timeout from 2000ms to 5000ms, adds explanatory comments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • lib/llm/src/http/service/disconnect.rs — New per-request cancellation state management, two-phase SGLang-specific logic, grace period timing semantics, and backend detection heuristics need careful validation for race conditions and edge cases.
  • Grace period consistency — Verify that the 300ms delay in handler_base.py aligns with the configurable grace period logic in disconnect.rs; confirm environment variable handling and defaults.
  • Metrics and logging — Ensure new disconnect metrics and debug logs are properly integrated and don't introduce performance overhead.
  • Test coverage — Confirm isolated tests adequately mock SGLang behavior and that the extended timeout in test_sglang.py appropriately reflects real-world timing.

Poem

🐰 A gentle grace descends with care,
No more harsh exits in the air!
Two phases dance, then rest complete,
Sweet 300ms—cancellation's beat.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix: Sglang Cancellation Aggregated Test" is partially related to the main changeset. It correctly identifies the SGLang cancellation domain and references the test being fixed, both of which are real aspects of the pull request. However, the title emphasizes the test fix outcome rather than the primary technical implementation—the two-phase cancellation mechanism with a configurable grace period that is the core change across multiple files. A colleague reviewing history would understand this addresses SGLang cancellation concerns, but the title does not convey the underlying architectural change or mechanism being introduced.
Description Check ✅ Passed The pull request description comprehensively follows the required template structure with all major sections completed. The Overview section clearly explains the flaky test issue and the two-phase cancellation solution. The Details section breaks down key changes including the two-phase mechanism, per-request idempotency, backend detection pattern, and configurable grace period. The "Where should the reviewer start?" section correctly points to the primary file of focus (lib/llm/src/http/service/disconnect.rs). The Related Issues section properly uses the "closes" action keyword with the GitHub issue reference (#3580). All required information is present and substantive.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/src/dynamo/sglang/request_handlers/handler_base.py (2)

151-157: Wrap abort_request call to avoid propagating exceptions

Abort can fail (e.g., request already finished). Catch and log to prevent noisy tracebacks during cancellation.

Apply:

-                self.engine.tokenizer_manager.abort_request(
-                    rid=sglang_request_id, abort_all=False
-                )
-                logging.info(f"Aborted Request ID: {context.id()}")
+                try:
+                    self.engine.tokenizer_manager.abort_request(
+                        rid=sglang_request_id, abort_all=False
+                    )
+                    logging.info(f"Aborted Request ID: {sglang_request_id} (Context: {context.id()})")
+                except Exception as e:
+                    logging.warning(
+                        f"abort_request failed for SGLang Request ID {sglang_request_id} (Context: {context.id()}): {e}"
+                    )

135-139: Incorrect log message: future wasn’t cancelled

The future just resolved; message should say “resolved/ready,” not “cancelled.”

Apply:

-            logging.debug(f"Request ID future cancelled for Context: {context.id()}")
+            logging.debug(f"Request ID future resolved for Context: {context.id()}")
🧹 Nitpick comments (5)
tests/fault_tolerance/cancellation/test_sglang.py (2)

194-197: Long prompt toggle is appropriate to surface races; optionally gate size

Consider parametrizing prompt length or gating via env (e.g., CANCELLATION_STRESS=1) to keep CI time predictable on constrained runners.


223-229: Derive polling timeout from CANCEL_GRACE_MS instead of hardcoding 5000ms

Keeps tests aligned with configured grace period while leaving headroom.

Apply:

-                _, worker_log_offset = poll_for_pattern(
+                grace_ms = int(os.getenv("CANCEL_GRACE_MS", "300"))
+                # 10x grace + 1s headroom
+                _, worker_log_offset = poll_for_pattern(
                     process=worker,
                     pattern=f"Aborted Request ID: {request_id}",
                     log_offset=worker_log_offset,
-                    max_wait_ms=5000,  # Increased from 2000ms to 5000ms to account for grace period
+                    max_wait_ms=grace_ms * 10 + 1000,
                 )
lib/llm/src/http/service/disconnect.rs (1)

269-287: Clamp and log configured grace to avoid pathological values

Protect against extreme env values and aid debugging.

Apply:

-fn cancel_grace_ms() -> u64 {
-    std::env::var("CANCEL_GRACE_MS")
-        .ok()
-        .and_then(|v| v.parse().ok())
-        .unwrap_or(300)
-}
+fn cancel_grace_ms() -> u64 {
+    let v = std::env::var("CANCEL_GRACE_MS")
+        .ok()
+        .and_then(|v| v.parse::<u64>().ok())
+        .unwrap_or(300);
+    v.clamp(0, 10_000)
+}

Optionally emit a debug once at startup with the chosen value.

test_cancellation_isolated.py (2)

19-30: Reduce timing flakiness; use perf_counter and drop tight upper bound

CI jitter can exceed 400ms. Base on env grace and check only lower bound.

Apply:

-    start_time = time.time()
-    await asyncio.sleep(grace_period_ms / 1000.0)  # Our implementation
-    end_time = time.time()
+    # Prefer monotonic clock for short intervals
+    start_time = time.perf_counter()
+    await asyncio.sleep(grace_period_ms / 1000.0)  # Our implementation
+    end_time = time.perf_counter()
@@
-    assert elapsed_ms >= 300, f"Grace period too short: {elapsed_ms}ms"
-    assert elapsed_ms <= 400, f"Grace period too long: {elapsed_ms}ms"
+    assert elapsed_ms >= grace_period_ms, f"Grace period too short: {elapsed_ms:.1f}ms"

Optionally read from CANCEL_GRACE_MS and assert against that value.


128-131: Avoid blind Exception catch and align with ruff TRY300

Either let exceptions bubble (preferred for tests) or use try/except…else for clarity.

Apply:

-    except Exception as e:
-        print(f"❌ Test failed: {e}")
-        return False
+    except Exception as e:
+        print(f"❌ Test failed: {e}")
+        return False
+    else:
+        return True

Or remove the try/except entirely and let failures surface to CI.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6deeecb and bc870af.

📒 Files selected for processing (4)
  • components/src/dynamo/sglang/request_handlers/handler_base.py (1 hunks)
  • lib/llm/src/http/service/disconnect.rs (4 hunks)
  • test_cancellation_isolated.py (1 hunks)
  • tests/fault_tolerance/cancellation/test_sglang.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/fault_tolerance/cancellation/test_sglang.py (1)
tests/fault_tolerance/cancellation/utils.py (2)
  • send_cancellable_request (239-263)
  • poll_for_pattern (319-391)
lib/llm/src/http/service/disconnect.rs (1)
lib/runtime/src/pipeline/context.rs (8)
  • id (59-61)
  • id (252-254)
  • id (332-334)
  • id (347-349)
  • stopped (276-278)
  • stopped (359-366)
  • kill (260-262)
  • kill (409-423)
🪛 GitHub Actions: Copyright Checks
test_cancellation_isolated.py

[error] 1-1: Copyright check failed. Missing/invalid header detected in test_cancellation_isolated.py.

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3896/merge) by vyalamar.
lib/llm/src/http/service/disconnect.rs

[warning] 1-1: Trailing whitespace fixed by pre-commit (ruff detected earlier; changes committed).

components/src/dynamo/sglang/request_handlers/handler_base.py

[warning] 1-1: Black formatting applied by pre-commit (reformatted file).

test_cancellation_isolated.py

[error] 1-1: Pre-commit check-shebang-scripts-are-executable failed: test_cancellation_isolated.py has a shebang but is not marked executable. Run 'chmod +x test_cancellation_isolated.py'.


[warning] 1-1: Trailing whitespace fixed by pre-commit (from test file as part of changes).

🪛 Ruff (0.14.1)
test_cancellation_isolated.py

1-1: Shebang is present but file is not executable

(EXE001)


126-126: Consider moving this statement to an else block

(TRY300)


128-128: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: clippy (lib/runtime/examples)
🔇 Additional comments (4)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)

133-141: Early-cancel corner case check

If cancellation arrives before SGLang request ID is set, this task waits until exit. Confirm Rust two‑phase cancel alone fully cleans up in that path to avoid dangling work/allocations. If not, consider racing on request_id_future vs. cancel signal and skipping abort when ID missing.

tests/fault_tolerance/cancellation/test_sglang.py (1)

164-166: Docstring update aligns with two‑phase cancellation

Looks good.

lib/llm/src/http/service/disconnect.rs (2)

203-216: Single cancel path reused for connection and stream

Good reuse; avoids double‑cancelling.


134-201: Trait API verified; no issues found.

The AsyncEngineContext trait properly exposes both stop_generating() (sync) and stopped() (async) methods. All implementations—StreamContext, Controller, HttpRequestContext, and test mocks—provide working versions. The code in disconnect.rs correctly invokes both methods: stop_generating() in Phase 1 and stopped() in the tokio::select! block for Phase 2. The idempotency latch prevents duplicate cancellations, and the two-phase flow is sound.

Comment on lines +158 to 166

# Add grace period to allow SGLang to process the cancellation gracefully
# This prevents the race condition where Rust runtime drops the stream
# before SGLang can properly clean up the request
grace_period_ms = 300 # 300ms recommended by project leaders for reliable cancellation
logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...")
await asyncio.sleep(grace_period_ms / 1000.0)
logging.debug(f"Grace period completed for Request ID: {context.id()}")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use the same env‑configurable grace as Rust (CANCEL_GRACE_MS) instead of hardcoding 300ms

Keeps behavior consistent and tunable across components.

Apply:

+import os
@@
-                grace_period_ms = 300  # 300ms recommended by project leaders for reliable cancellation
-                logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...")
-                await asyncio.sleep(grace_period_ms / 1000.0)
-                logging.debug(f"Grace period completed for Request ID: {context.id()}")
+                grace_period_ms_str = os.getenv("CANCEL_GRACE_MS", "300")
+                try:
+                    grace_period_ms = max(0, min(int(grace_period_ms_str), 10000))
+                except ValueError:
+                    grace_period_ms = 300
+                logging.debug(f"Waiting {grace_period_ms}ms for SGLang graceful cleanup...")
+                await asyncio.sleep(grace_period_ms / 1000.0)
+                logging.debug(
+                    f"Grace period completed for SGLang Request ID {sglang_request_id}, Context: {context.id()}"
+                )

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1 to +5
#!/usr/bin/env python3
"""
Isolated test for cancellation grace period fix.
This test doesn't import SGLang dependencies to avoid platform compatibility issues.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pipeline blocker: add SPDX header and remove shebang or make file executable

Pre‑commit failed due to missing/invalid header and non‑executable shebang. Easiest: add header and remove shebang.

Apply:

-#!/usr/bin/env python3
-"""
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+"""
 Isolated test for cancellation grace period fix.
 This test doesn't import SGLang dependencies to avoid platform compatibility issues.
 """

Alternatively, keep shebang and set executable bit in git; but header is still required.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/usr/bin/env python3
"""
Isolated test for cancellation grace period fix.
This test doesn't import SGLang dependencies to avoid platform compatibility issues.
"""
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Isolated test for cancellation grace period fix.
This test doesn't import SGLang dependencies to avoid platform compatibility issues.
"""
🧰 Tools
🪛 GitHub Actions: Copyright Checks

[error] 1-1: Copyright check failed. Missing/invalid header detected in test_cancellation_isolated.py.

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3896/merge) by vyalamar.

[error] 1-1: Pre-commit check-shebang-scripts-are-executable failed: test_cancellation_isolated.py has a shebang but is not marked executable. Run 'chmod +x test_cancellation_isolated.py'.


[warning] 1-1: Trailing whitespace fixed by pre-commit (from test file as part of changes).

🪛 Ruff (0.14.1)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents
In test_cancellation_isolated.py around lines 1-5, the file is missing the
required SPDX license header and contains a shebang which fails
non-executable/pre-commit checks; add the appropriate SPDX header as the first
non-blank line (e.g. SPDX-License-Identifier: <LICENSE>) and remove the shebang
line, or if you must keep the shebang make the file executable in git (git
update-index --chmod=+x test_cancellation_isolated.py) while still adding the
SPDX header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant