-
Notifications
You must be signed in to change notification settings - Fork 668
feat: Add legacy client along with aiperf for FT tests #3415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
WalkthroughIntroduces dual-client support for fault-tolerance tests by adding client and parse factories, a legacy client implementation, legacy result parsers, scenario config updates, a pytest CLI option for client selection, and refactoring test wiring to use the factories. README gains duplicated documentation content. No changes to exported APIs outside test modules. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester
participant Pytest as Pytest (conftest)
participant Scenario as Scenario Fixture
participant Test as test_deployment.py
participant Factory as client_factory
participant Client as aiperf/legacy client
Tester->>Pytest: Run with --client-type (optional)
Pytest-->>Scenario: client_type fixture
Scenario-->>Test: Load(client_type, retries, rate, ...)
Test->>Factory: get_client_function(Load.client_type)
Factory-->>Test: client_fn
loop Clients
Test->>Client: start client_fn(deployment, Load, idx)
Client-->>Test: per-request logs (JSONL/text)
end
Note over Test,Client: Clients run with retries/rate based on client_type
sequenceDiagram
autonumber
actor Tester
participant Parser as parse_factory
participant AIP as aiperf parser
participant LEG as legacy_parse_results
Tester->>Parser: parse_test_results(log_dir|log_paths, [force_parser])
alt force aiperf
Parser->>AIP: parse_results(...)
AIP-->>Parser: table/summary
else force legacy
Parser->>LEG: main/process_test_directory(...)
LEG-->>Parser: table/summary
else auto-detect
Parser->>Parser: detect_result_type(log_dir)
alt aiperf detected
Parser->>AIP: parse_results(...)
AIP-->>Parser: table/summary
else legacy detected
Parser->>LEG: main/process_test_directory(...)
LEG-->>Parser: table/summary
else none
Parser-->>Tester: warning / no results
end
end
Parser-->>Tester: parsed results or info
sequenceDiagram
autonumber
participant Legacy as legacy_client
participant Deploy as ManagedDeployment
participant Pod as Target Pod
participant Svc as Frontend (port-forward)
participant API as HTTP Endpoint
Legacy->>Deploy: list pods / ensure forwards
loop Requests
Legacy->>Pod: select next ready pod (round-robin)
Legacy->>Svc: ensure port-forward
Legacy->>API: POST generate(payload)
API-->>Legacy: response / error
Legacy->>Legacy: retry if needed (max_retries, delay)
Legacy-->>Legacy: log JSONL entry + sleep for rate limit
end
Legacy->>Deploy: cleanup forwards
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
tests/fault_tolerance/deploy/README.md(1 hunks)tests/fault_tolerance/deploy/client_factory.py(1 hunks)tests/fault_tolerance/deploy/conftest.py(2 hunks)tests/fault_tolerance/deploy/legacy_client.py(1 hunks)tests/fault_tolerance/deploy/legacy_parse_results.py(1 hunks)tests/fault_tolerance/deploy/parse_factory.py(1 hunks)tests/fault_tolerance/deploy/scenarios.py(2 hunks)tests/fault_tolerance/deploy/test_deployment.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tests/fault_tolerance/deploy/client_factory.py (2)
tests/fault_tolerance/deploy/conftest.py (1)
client_type(42-44)tests/fault_tolerance/deploy/legacy_client.py (1)
client(174-292)
tests/fault_tolerance/deploy/scenarios.py (1)
tests/fault_tolerance/deploy/conftest.py (1)
client_type(42-44)
tests/fault_tolerance/deploy/legacy_client.py (1)
tests/utils/managed_deployment.py (3)
ManagedDeployment(359-802)get_pods(572-598)port_forward(685-756)
tests/fault_tolerance/deploy/parse_factory.py (1)
tests/fault_tolerance/deploy/legacy_parse_results.py (1)
main(421-557)
tests/fault_tolerance/deploy/test_deployment.py (4)
tests/fault_tolerance/deploy/client_factory.py (1)
get_client_function(21-69)tests/fault_tolerance/deploy/parse_factory.py (1)
parse_test_results(101-222)tests/fault_tolerance/deploy/scenarios.py (1)
Load(96-104)tests/fault_tolerance/deploy/conftest.py (2)
client_type(42-44)namespace(37-38)
🪛 markdownlint-cli2 (0.18.1)
tests/fault_tolerance/deploy/README.md
478-478: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.2)
tests/fault_tolerance/deploy/client_factory.py
66-69: Avoid specifying long messages outside the exception class
(TRY003)
120-123: Avoid specifying long messages outside the exception class
(TRY003)
tests/fault_tolerance/deploy/legacy_client.py
72-72: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
80-80: Unused function argument: logger
(ARG001)
282-282: Do not catch blind exception: Exception
(BLE001)
283-283: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
286-286: Loop control variable pf_name not used within loop body
Rename unused pf_name to _pf_name
(B007)
289-290: try-except-pass detected, consider logging the exception
(S110)
289-289: Do not catch blind exception: Exception
(BLE001)
tests/fault_tolerance/deploy/legacy_parse_results.py
315-315: Unused function argument: failure_type
(ARG001)
349-349: Loop control variable process not used within loop body
(B007)
350-350: Loop control variable replica not used within loop body
Rename unused replica to _replica
(B007)
tests/fault_tolerance/deploy/parse_factory.py
141-144: Avoid specifying long messages outside the exception class
(TRY003)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
175-178: Avoid specifying long messages outside the exception class
(TRY003)
222-222: Avoid specifying long messages outside the exception class
(TRY003)
368-368: Do not catch blind exception: Exception
(BLE001)
369-369: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/fault_tolerance/deploy/test_deployment.py
168-168: Do not catch blind exception: Exception
(BLE001)
169-169: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
193-193: Do not catch blind exception: Exception
(BLE001)
194-194: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on this.
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
|
/ok to test 3d2d13a |
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
…into ibhosale_custom_client
|
/ok to test 32f13ec |
|
/ok to test 2aede48 |
|
/ok to test 3c66b9a |
|
/ok to test b58726a |
|
/ok to test d9d584a |
|
/ok to test d9d584a |
Signed-off-by: Indrajit Bhosale <[email protected]>
Overview:
Adds dual client support for fault tolerance tests, enabling dynamic selection between AI-Perf and legacy custom client via command-line argument
--client-type.Details:
New Files:
legacy_client.py- Original custom HTTP client implementation (JSONL logging)client_factory.py- Factory pattern for client selectionlegacy_parse_results.py- Parser for legacy JSONL result formatparse_factory.py- Auto-detects result format and routes to appropriate parserModified Files:
scenarios.py- Addedclient_typeandmax_request_ratefields toLoaddataclasstest_deployment.py- Integrated factory patterns for dynamic client/parser selectionconftest.py- Added--client-typepytest command-line optionREADME.md- Documented dual client usage and examplesKey Features:
--client-type aiperfor--client-type legacyWhere should the reviewer start?
client_factory.py- Review factory pattern implementationparse_factory.py- Check auto-detection logictest_deployment.py- Review scenario fixture integrationconftest.py- Verify new pytest optionREADME.md- Review documentation completenessRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Tests
Documentation
Refactor