- 
                Notifications
    
You must be signed in to change notification settings  - Fork 670
 
feat: Add prompt > seq_len k8 tests. #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: [email protected] <[email protected]>
          
WalkthroughThis pull request introduces token overflow testing support by adding configuration fields for overflow/recovery phases, a new TokenOverflowFailure class for injection, helper functions for parsing mixed-token test directories, output control parameters, and phase-aware test execution logic that manages separate client processes for overflow and recovery phases. Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 
 Poem
 Pre-merge checks✅ Passed checks (3 passed)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
tests/fault_tolerance/deploy/test_deployment.py (1)
336-349: Robust TRT‑LLM model detection without relying on deployment name
scenario.deployment.nameis overwritten to"fault-tolerance-test", soagg/disagginference by name fails. Probe services instead:- elif scenario.backend == "trtllm": - # Determine deployment type from scenario deployment name - if ( - "agg" in scenario.deployment.name - and "disagg" not in scenario.deployment.name - ): - model = scenario.deployment["TRTLLMWorker"].model - else: - model = scenario.deployment["TRTLLMDecodeWorker"].model + elif scenario.backend == "trtllm": + try: + model = scenario.deployment["TRTLLMWorker"].model # agg + except KeyError: + model = scenario.deployment["TRTLLMDecodeWorker"].model # disaggThis prevents falling back to the default model unnecessarily.
🧹 Nitpick comments (2)
tests/utils/managed_deployment.py (1)
319-371: Harden arg editing: support--arg=value, drop inner import, minor validation
- Existing logic misses equals-form tokens (e.g.,
 --max-seq-len=2048) and may duplicate args.- Redundant inner
 import shlex(already imported at file top).- Optional: normalize/validate
 arg_nameshape.Apply:
def add_arg_to_service(self, service_name: str, arg_name: str, arg_value: str): @@ - if isinstance(args_list, str): - import shlex - - args_list = shlex.split(args_list) - service["extraPodSpec"]["mainContainer"]["args"] = args_list + if isinstance(args_list, str): + # Normalize single string to list of tokens + args_list = shlex.split(args_list) + service["extraPodSpec"]["mainContainer"]["args"] = args_list + + # Normalize existing equals-form tokens to [arg, value] + normalized: list[str] = [] + eq_prefix = f"{arg_name}=" + for tok in args_list: + if tok.startswith(eq_prefix): + normalized.extend([arg_name, tok[len(eq_prefix):]]) + else: + normalized.append(tok) + args_list[:] = normalized @@ - # Find existing argument + # Find existing argument arg_index = None for i, arg in enumerate(args_list): if arg == arg_name: arg_index = i break @@ - else: - # Add new argument - args_list.extend([arg_name, arg_value]) + else: + # Add new argument + args_list.extend([arg_name, arg_value])Optional: guard unusual names
+ if not arg_name.startswith("--"): + logging.warning("add_arg_to_service: unexpected arg_name '%s'", arg_name)Please confirm if any of your YAMLs use
--arg=valuestyle so we can add a targeted unit test for this path.tests/fault_tolerance/deploy/test_deployment.py (1)
176-186: Log token overflow “injection” for traceabilityCurrently
TokenOverflowFailurepath silentlycontinues, sotest.log.txtlacks an injection line.if isinstance(failure, TokenOverflowFailure): - # The actual overflow is handled by the client configuration - # which uses the input_token_length from the Load config - # This is just logging for visibility - continue + logger.info( + "TokenOverflowFailure active: max_seq_len=%s, overflow_multiplier=%s, tokens=%s", + getattr(failure, "max_seq_len", "unknown"), + getattr(failure, "overflow_multiplier", "unknown"), + getattr(failure, "overflow_token_count", "unknown"), + ) + continue
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tests/fault_tolerance/deploy/parse_factory.py(4 hunks)tests/fault_tolerance/deploy/parse_results.py(12 hunks)tests/fault_tolerance/deploy/scenarios.py(3 hunks)tests/fault_tolerance/deploy/test_deployment.py(4 hunks)tests/utils/managed_deployment.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/fault_tolerance/deploy/test_deployment.py (3)
tests/fault_tolerance/deploy/parse_results.py (1)
process_overflow_recovery_test(675-767)tests/fault_tolerance/deploy/scenarios.py (2)
Load(96-110)TokenOverflowFailure(123-145)tests/fault_tolerance/deploy/parse_factory.py (1)
parse_test_results(101-228)
tests/fault_tolerance/deploy/scenarios.py (1)
tests/utils/managed_deployment.py (3)
model(63-74)model(77-105)add_arg_to_service(319-370)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3930/merge) by tzulingk.
tests/fault_tolerance/deploy/test_deployment.py
[error] 290-290: Ruff: Local variable 'all_results' is assigned to but never used. (F841)
🪛 Ruff (0.14.1)
tests/utils/managed_deployment.py
330-330: Avoid specifying long messages outside the exception class
(TRY003)
tests/fault_tolerance/deploy/test_deployment.py
255-255: Do not catch blind exception: Exception
(BLE001)
256-256: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
290-290: Local variable all_results is assigned to but never used
Remove assignment to unused variable all_results
(F841)
297-297: Loop control variable base_name not used within loop body
(B007)
tests/fault_tolerance/deploy/scenarios.py
136-136: Unused method argument: duration
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: operator (amd64)
 - GitHub Check: trtllm (arm64)
 - GitHub Check: Build and Test - dynamo
 
🔇 Additional comments (7)
tests/fault_tolerance/deploy/scenarios.py (2)
106-111: Load extensions look goodFields for mixed overflow/recovery are clear and self-contained.
518-653: CLI argument flags are correct; no changes required.All three backend CLI flags have been verified:
- vLLM uses
 --max-model-len✓- TensorRT-LLM uses
 --max_seq_len; the code specifies--max-seq-len(hyphenated), which is acceptable because argparse automatically converts dashes to underscores in optional arguments ✓- SGLang uses
 --context-length✓The
is_aggdetection via substring matching is appropriate and requires no change.tests/fault_tolerance/deploy/parse_results.py (4)
175-211: Helper to extract backend/deploy_type from dir name — solidPattern-based extraction for mixed-token tests is reasonable and isolated.
252-277: Fallback to decode worker for mixed tests — LGTMGracefully handles absence of
failure_infoby deriving component path from test dir.
406-457: AI‑Perf parsing improvements — LGTMGood fallbacks for zero records and ms→s conversions; preserves robustness on partial data.
577-665: Phase-aware processing: concise and clearNice separation of overflow vs recovery behavior with optional printing; no action needed.
tests/fault_tolerance/deploy/parse_factory.py (1)
101-108: print_output propagation — LGTMNew parameter is consistently threaded through aiperf/legacy paths while preserving defaults.
Also applies to: 185-206
Signed-off-by: [email protected] <[email protected]>
| 
           Please fix the test failures:  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this extensive test!
Can we also run some normal tests to make sure we have no regression?
Mostly minor comments with code restructiong and config reading concerns
| 
           Can we also update FT docs with the new test?  | 
    
          
 Done in commit 03f7e64  | 
    
Signed-off-by: [email protected] <[email protected]>
          
 tested on  | 
    
Signed-off-by: [email protected] <[email protected]>
          
 done in commit de91ba7  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with the test and assertions!
LGTM!
Signed-off-by: [email protected] <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Checking both that bad requests get rejected and that the system actually bounces back. Nice work reusing WORKER_MAP and keeping it consistent across all three backends.
Some general coding comments:
- You've got all_metrics with 10+ fields and deployment_info getting passed around everywhere as plain dicts. These should be dataclasses - then you'll get autocomplete, type checking, and catch bugs before runtime instead of getting KeyErrors in production. Dict[str, Any] loses all the benefits of Python's type system.
 - The print() and logging mix is problematic, you should pick one. Test frameworks should use all logging (like info,debug,warning) so users can control verbosity. I see 
print(f"\n{'='*60}")in some places andlogging.warning()in others - it makes output reading/parsing harder, on the humans and scripts that may read it. 
…ss_test_phase_results Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
          
 Create https://linear.app/nvidia/issue/DIS-947/refctor-use-dataclass-for-passing-arguments to track this.  | 
    
…-30T15:04:15 INFO root: being printed Signed-off-by: [email protected] <[email protected]>
| 
           @keivenchang Note that although the logs look a bit less clean after replacing print() with logging, I prefer to use logging.info() instead of print(). Using logging is a more standardized approach for message output, and it also avoids the buffering issues between print and logging that can cause mixed or out-of-order logs.  | 
    
Overview:
Adds fault tolerance tests for token overflow scenarios where client requests exceed max_seq_len. Tests verify that systems properly reject oversized requests and successfully recover to handle normal requests afterward.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-872
Summary by CodeRabbit
New Features
Improvements