feat: Add prompt > seq_len k8 tests. #3930

tzulingk · 2025-10-28T02:18:54Z

Overview:

Adds fault tolerance tests for token overflow scenarios where client requests exceed max_seq_len. Tests verify that systems properly reject oversized requests and successfully recover to handle normal requests afterward.

Details:

New fault injection type: TokenOverflowFailure for testing prompt length > max_seq_len scenarios
Two-phase testing: Sends 15 oversized requests (2x max_seq_len) followed by 15 normal requests to test rejection and recovery
Dynamic configuration: DeploymentSpec.add_arg_to_service() method to set max sequence length at runtime for vLLM (--max-model-len), TRT-LLM (--max-seq-len), and SGLang (--context-length)
Enhanced parsing: Detects mixed token tests and calculates recovery time between overflow/recovery phases using worker logs
Test coverage: 6 scenarios covering vLLM, TRT-LLM, and SGLang in both aggregated and disaggregated deployments

Where should the reviewer start?

tests/fault_tolerance/deploy/scenarios.py - Lines 460-550: Core overflow scenario creation logic
tests/utils/managed_deployment.py - Lines 173-213: add_arg_to_service() implementation
tests/fault_tolerance/deploy/test_deployment.py - Lines 113-160: Two-phase client execution logic
tests/fault_tolerance/deploy/parse_results.py - Lines 174-231: Recovery time calculation for mixed tests

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-872

Summary by CodeRabbit

New Features
- Added token overflow testing support with configurable scenarios across multiple backends.
- Introduced mixed-token test workflows enabling sequential overflow and recovery phase execution.
- New paired result analysis to compare and summarize overflow/recovery test outcomes.
Improvements
- Enhanced recovery time calculation for diverse test layouts and configurations.
- Improved robustness of test result parsing and flexible output control options.

Signed-off-by: [email protected] <[email protected]>

coderabbitai · 2025-10-28T02:26:34Z

Walkthrough

This pull request introduces token overflow testing support by adding configuration fields for overflow/recovery phases, a new TokenOverflowFailure class for injection, helper functions for parsing mixed-token test directories, output control parameters, and phase-aware test execution logic that manages separate client processes for overflow and recovery phases.

Changes

Cohort / File(s)	Summary
Parse infrastructure `tests/fault_tolerance/deploy/parse_factory.py`, `tests/fault_tolerance/deploy/parse_results.py`	Added `print_output: bool = True` parameter to `parse_test_results()` and propagated it through parser calls. Extended `process_single_test()` and `main()` with `print_output` parameter for controlled output. Added `extract_test_info_from_dir()` and `get_decode_worker_dir()` helpers to parse test directory configuration for non-standard layouts. Introduced `process_overflow_recovery_test()` to handle paired overflow/recovery result summaries. Enhanced AI-Perf parsing robustness for missing nested dictionaries.
Test scenario configuration `tests/fault_tolerance/deploy/scenarios.py`	Added mixed-token test configuration fields to `Load` class: `mixed_token_test`, `overflow_token_length`, `overflow_request_count`, `normal_request_count`. Introduced new `TokenOverflowFailure` class with overflow multiplier and token count computation. Added `add_token_overflow_scenarios()` function to generate and register token overflow test scenarios across backends (vllm, trtllm, sglang).
Test execution and deployment `tests/fault_tolerance/deploy/test_deployment.py`, `tests/utils/managed_deployment.py`	Implemented mixed-token test flow with overflow and recovery phases, each spawning separate client processes with distinct node suffixes. Added `TokenOverflowFailure` handling in failure injection to skip standard pod/process injection. Expanded results processing to compute paired log paths and invoke dedicated overflow/recovery result parsing. Added `DeploymentSpec.add_arg_to_service()` method to configure service arguments dynamically.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

test_deployment.py: Review the mixed-token test flow logic, phase detection, and log path coordination for overflow/recovery cycles; verify proper cleanup and client process management across phases.
parse_results.py: Validate new helper functions for directory parsing and the extended recovery time calculation logic; check robustness of AI-Perf parsing fallbacks for missing nested dictionaries.
scenarios.py: Ensure token overflow scenario registration and TokenOverflowFailure initialization are correct across all backends.

Poem

🐰 Token overflow tests now flow,
With phases split—high and low.
Recovery paths parsed with care,
Output controlled with flair!
The rabbit hops through logic clear.
✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The pull request description comprehensively follows the required template structure with all four sections present and well-populated. The Overview clearly describes the purpose of adding token overflow tests, the Details section provides a detailed list of changes including the new TokenOverflowFailure type, two-phase testing approach, dynamic configuration method, enhanced parsing, and test coverage. The "Where should the reviewer start?" section provides valuable guidance with specific file paths and line number ranges, and the Related Issues section correctly references the issue number. The description is informative and complete, enabling reviewers to understand both the what and the why of the changes.
Docstring Coverage	✅ Passed	Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The pull request title "feat: Add prompt > seq_len k8 tests" directly corresponds to the main objective of this changeset. The PR introduces fault-tolerance tests for token overflow scenarios where client requests exceed max_seq_len, adds a new TokenOverflowFailure fault injection type, implements two-phase test execution (overflow verification followed by recovery verification), and enhances parsing logic to detect and process these mixed-token tests. The title accurately captures this primary purpose by referring to the key technical concept (prompt exceeding sequence length) and the deployment environment (k8 for Kubernetes). The title is specific and meaningful rather than vague or generic, clearly conveying to reviewers the nature of the change.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/fault_tolerance/deploy/test_deployment.py (1)

336-349: Robust TRT‑LLM model detection without relying on deployment name

scenario.deployment.name is overwritten to "fault-tolerance-test", so agg/disagg inference by name fails. Probe services instead:

-            elif scenario.backend == "trtllm":
-                # Determine deployment type from scenario deployment name
-                if (
-                    "agg" in scenario.deployment.name
-                    and "disagg" not in scenario.deployment.name
-                ):
-                    model = scenario.deployment["TRTLLMWorker"].model
-                else:
-                    model = scenario.deployment["TRTLLMDecodeWorker"].model
+            elif scenario.backend == "trtllm":
+                try:
+                    model = scenario.deployment["TRTLLMWorker"].model  # agg
+                except KeyError:
+                    model = scenario.deployment["TRTLLMDecodeWorker"].model  # disagg

This prevents falling back to the default model unnecessarily.

🧹 Nitpick comments (2)

tests/utils/managed_deployment.py (1)

319-371: Harden arg editing: support --arg=value, drop inner import, minor validation

Existing logic misses equals-form tokens (e.g., --max-seq-len=2048) and may duplicate args.
Redundant inner import shlex (already imported at file top).
Optional: normalize/validate arg_name shape.

Apply:

 def add_arg_to_service(self, service_name: str, arg_name: str, arg_value: str):
@@
-        if isinstance(args_list, str):
-            import shlex
-
-            args_list = shlex.split(args_list)
-            service["extraPodSpec"]["mainContainer"]["args"] = args_list
+        if isinstance(args_list, str):
+            # Normalize single string to list of tokens
+            args_list = shlex.split(args_list)
+            service["extraPodSpec"]["mainContainer"]["args"] = args_list
+
+        # Normalize existing equals-form tokens to [arg, value]
+        normalized: list[str] = []
+        eq_prefix = f"{arg_name}="
+        for tok in args_list:
+            if tok.startswith(eq_prefix):
+                normalized.extend([arg_name, tok[len(eq_prefix):]])
+            else:
+                normalized.append(tok)
+        args_list[:] = normalized
@@
-        # Find existing argument
+        # Find existing argument
         arg_index = None
         for i, arg in enumerate(args_list):
             if arg == arg_name:
                 arg_index = i
                 break
@@
-        else:
-            # Add new argument
-            args_list.extend([arg_name, arg_value])
+        else:
+            # Add new argument
+            args_list.extend([arg_name, arg_value])

Optional: guard unusual names

+        if not arg_name.startswith("--"):
+            logging.warning("add_arg_to_service: unexpected arg_name '%s'", arg_name)

Please confirm if any of your YAMLs use --arg=value style so we can add a targeted unit test for this path.

tests/fault_tolerance/deploy/test_deployment.py (1)

176-186: Log token overflow “injection” for traceability

Currently TokenOverflowFailure path silently continues, so test.log.txt lacks an injection line.

     if isinstance(failure, TokenOverflowFailure):
-        # The actual overflow is handled by the client configuration
-        # which uses the input_token_length from the Load config
-        # This is just logging for visibility
-        continue
+        logger.info(
+            "TokenOverflowFailure active: max_seq_len=%s, overflow_multiplier=%s, tokens=%s",
+            getattr(failure, "max_seq_len", "unknown"),
+            getattr(failure, "overflow_multiplier", "unknown"),
+            getattr(failure, "overflow_token_count", "unknown"),
+        )
+        continue

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a79122c and 840ca00.

📒 Files selected for processing (5)

tests/fault_tolerance/deploy/parse_factory.py (4 hunks)
tests/fault_tolerance/deploy/parse_results.py (12 hunks)
tests/fault_tolerance/deploy/scenarios.py (3 hunks)
tests/fault_tolerance/deploy/test_deployment.py (4 hunks)
tests/utils/managed_deployment.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/fault_tolerance/deploy/test_deployment.py (3)

tests/fault_tolerance/deploy/parse_results.py (1)

process_overflow_recovery_test (675-767)

tests/fault_tolerance/deploy/scenarios.py (2)

Load (96-110)

TokenOverflowFailure (123-145)

tests/fault_tolerance/deploy/parse_factory.py (1)

parse_test_results (101-228)

tests/fault_tolerance/deploy/scenarios.py (1)

tests/utils/managed_deployment.py (3)

model (63-74)

model (77-105)

add_arg_to_service (319-370)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3930/merge) by tzulingk.

tests/fault_tolerance/deploy/test_deployment.py

[error] 290-290: Ruff: Local variable 'all_results' is assigned to but never used. (F841)

🪛 Ruff (0.14.1)

tests/utils/managed_deployment.py

330-330: Avoid specifying long messages outside the exception class

(TRY003)

tests/fault_tolerance/deploy/test_deployment.py

255-255: Do not catch blind exception: Exception

(BLE001)

256-256: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

290-290: Local variable all_results is assigned to but never used

Remove assignment to unused variable all_results

(F841)

297-297: Loop control variable base_name not used within loop body

(B007)

tests/fault_tolerance/deploy/scenarios.py

136-136: Unused method argument: duration

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: operator (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (7)

tests/fault_tolerance/deploy/scenarios.py (2)

106-111: Load extensions look good

Fields for mixed overflow/recovery are clear and self-contained.

518-653: CLI argument flags are correct; no changes required.

All three backend CLI flags have been verified:

vLLM uses --max-model-len ✓

TensorRT-LLM uses --max_seq_len; the code specifies --max-seq-len (hyphenated), which is acceptable because argparse automatically converts dashes to underscores in optional arguments ✓

SGLang uses --context-length ✓

The is_agg detection via substring matching is appropriate and requires no change.

tests/fault_tolerance/deploy/parse_results.py (4)

175-211: Helper to extract backend/deploy_type from dir name — solid

Pattern-based extraction for mixed-token tests is reasonable and isolated.

252-277: Fallback to decode worker for mixed tests — LGTM

Gracefully handles absence of failure_info by deriving component path from test dir.

406-457: AI‑Perf parsing improvements — LGTM

Good fallbacks for zero records and ms→s conversions; preserves robustness on partial data.

577-665: Phase-aware processing: concise and clear

Nice separation of overflow vs recovery behavior with optional printing; no action needed.

tests/fault_tolerance/deploy/parse_factory.py (1)

101-108: print_output propagation — LGTM

New parameter is consistently threaded through aiperf/legacy paths while preserving defaults.

Also applies to: 185-206

tests/fault_tolerance/deploy/parse_results.py

tests/fault_tolerance/deploy/scenarios.py

tests/fault_tolerance/deploy/test_deployment.py

Signed-off-by: [email protected] <[email protected]>

rmccorm4 · 2025-10-28T17:31:45Z

Please fix the test failures:

tests/fault_tolerance/deploy/scenarios.py:615: in add_token_overflow_scenarios
    overflow_failure = TokenOverflowFailure(
E   TypeError: TokenOverflowFailure.__init__() got an unexpected keyword argument 'duration'

indrajit96

Thanks a lot for this extensive test!
Can we also run some normal tests to make sure we have no regression?
Mostly minor comments with code restructiong and config reading concerns

tests/fault_tolerance/deploy/parse_results.py

tests/fault_tolerance/deploy/scenarios.py

tests/fault_tolerance/deploy/test_deployment.py

indrajit96 · 2025-10-28T18:23:27Z

Can we also update FT docs with the new test?

Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-10-29T04:23:25Z

Please fix the test failures:

tests/fault_tolerance/deploy/scenarios.py:615: in add_token_overflow_scenarios
    overflow_failure = TokenOverflowFailure(
E   TypeError: TokenOverflowFailure.__init__() got an unexpected keyword argument 'duration'

Done in commit 03f7e64

Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-10-29T05:52:52Z

Thanks a lot for this extensive test!
Can we also run some normal tests to make sure we have no regression?
Mostly minor comments with code restructiong and config reading concerns

tested on
test_fault_scenario[sglang-agg-tp-1-dp-1-frontend]
test_fault_scenario[trtllm-agg-tp-2-dp-1-decode_worker_pod]

Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-10-29T16:50:51Z

Can we also update FT docs with the new test?

done in commit de91ba7

indrajit96

Nice work with the test and assertions!
LGTM!

Signed-off-by: [email protected] <[email protected]>

keivenchang

Nice. Checking both that bad requests get rejected and that the system actually bounces back. Nice work reusing WORKER_MAP and keeping it consistent across all three backends.

Some general coding comments:

You've got all_metrics with 10+ fields and deployment_info getting passed around everywhere as plain dicts. These should be dataclasses - then you'll get autocomplete, type checking, and catch bugs before runtime instead of getting KeyErrors in production. Dict[str, Any] loses all the benefits of Python's type system.
The print() and logging mix is problematic, you should pick one. Test frameworks should use all logging (like info,debug,warning) so users can control verbosity. I see print(f"\n{'='*60}") in some places and logging.warning() in others - it makes output reading/parsing harder, on the humans and scripts that may read it.

tests/fault_tolerance/deploy/parse_results.py

tests/fault_tolerance/deploy/scenarios.py

tests/fault_tolerance/deploy/parse_results.py

…ss_test_phase_results Signed-off-by: [email protected] <[email protected]>

Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-10-30T22:24:43Z

Nice. Checking both that bad requests get rejected and that the system actually bounces back. Nice work reusing WORKER_MAP and keeping it consistent across all three backends.

Some general coding comments:

You've got all_metrics with 10+ fields and deployment_info getting passed around everywhere as plain dicts. These should be dataclasses - then you'll get autocomplete, type checking, and catch bugs before runtime instead of getting KeyErrors in production. Dict[str, Any] loses all the benefits of Python's type system.

The print() and logging mix is problematic, you should pick one. Test frameworks should use all logging (like info,debug,warning) so users can control verbosity. I see print(f"\n{'='*60}") in some places and logging.warning() in others - it makes output reading/parsing harder, on the humans and scripts that may read it.

Create https://linear.app/nvidia/issue/DIS-947/refctor-use-dataclass-for-passing-arguments to track this.
Replace print() with logging.

…-30T15:04:15 INFO root: being printed Signed-off-by: [email protected] <[email protected]>

tzulingk · 2025-10-31T03:14:27Z

@keivenchang Note that although the logs look a bit less clean after replacing print() with logging, I prefer to use logging.info() instead of print(). Using logging is a more standardized approach for message output, and it also avoids the buffering issues between print and logging that can cause mixed or out-of-order logs.

[TEST] 2025-10-30T19:52:54 INFO root: 
============================================================
SESSION SUMMARY - COMBINED OVERFLOW/RECOVERY TEST
============================================================

Phase Breakdown:
  Overflow: 43/45 rejected (95.6%)
  Recovery: 45/45 succeeded (100.0%)
[TEST] 2025-10-30T19:52:54 INFO root: 
============================================================
FAULT TOLERANCE TEST SUMMARY - AI-PERF
============================================================

…root: Signed-off-by: [email protected] <[email protected]>

Add prompt > seq_len k8 tests.

840ca00

Signed-off-by: [email protected] <[email protected]>

tzulingk requested review from a team as code owners October 28, 2025 02:18

pull-request-size bot added the size/XL label Oct 28, 2025

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

tzulingk changed the title ~~Add prompt > seq_len k8 tests.~~ feat: Add prompt > seq_len k8 tests. Oct 28, 2025

github-actions bot added the feat label Oct 28, 2025

tzulingk enabled auto-merge (squash) October 28, 2025 04:34

address coderabbit comments

fa7c77d

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 04:47 Inactive

tzulingk requested review from indrajit96 and rmccorm4 October 28, 2025 04:48

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 04:48 Inactive

indrajit96 reviewed Oct 28, 2025

View reviewed changes

Remove duation=30 from TokenOverflowFailure()

03f7e64

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 04:22 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 04:23 Inactive

Make success threshold configurable

57a69a9

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 04:54 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 04:59 Inactive

Refactor process_single_test.

e54d33e

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 05:17 Inactive

Use constant for _overflow and _recovery.

d2f4d7a

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 05:24 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 05:25 Inactive

tzulingk requested a review from indrajit96 October 29, 2025 05:53

README.md for overflow tests.

de91ba7

Signed-off-by: [email protected] <[email protected]>

indrajit96 approved these changes Oct 29, 2025

View reviewed changes

Fix pytest errors

6886334

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 21:34 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 21:39 Inactive

keivenchang requested changes Oct 30, 2025

View reviewed changes

tzulingk added 4 commits October 30, 2025 14:27

use logging to replace print. And add defensive programming in _proce…

0445716

…ss_test_phase_results Signed-off-by: [email protected] <[email protected]>

Use Enum for test_phase

1c03a07

Signed-off-by: [email protected] <[email protected]>

Create DeploymentInfo instead of using dict

d308139

Signed-off-by: [email protected] <[email protected]>

Reuse the enum names for test_phase directories suffixes

6b20e7f

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 22:02 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 22:12 Inactive

Combine multiple logging.info into 1 to avoid multiple [TEST] 2025-10…

3efd3bf

…-30T15:04:15 INFO root: being printed Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:51 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 00:07 Inactive

Combine some loggings to avoid extra [TEST] 2025-10-30T19:52:54 INFO …

2650b95

…root: Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 03:16 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 03:17 Inactive

tzulingk requested a review from keivenchang October 31, 2025 03:17

keivenchang approved these changes Oct 31, 2025

View reviewed changes

tzulingk merged commit c4abe9b into main Oct 31, 2025
22 of 23 checks passed

tzulingk deleted the tzulingk/overflow_k8_test branch October 31, 2025 04:20

feat: Add prompt > seq_len k8 tests. #3930

feat: Add prompt > seq_len k8 tests. #3930

Uh oh!

Conversation

tzulingk commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Oct 28, 2025

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

indrajit96 commented Oct 28, 2025

Uh oh!

tzulingk commented Oct 29, 2025

Uh oh!

tzulingk commented Oct 29, 2025

Uh oh!

tzulingk commented Oct 29, 2025

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tzulingk commented Oct 30, 2025

Uh oh!

tzulingk commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tzulingk commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading