-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database #8653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database #8653
Conversation
Signed-off-by: Chenfei Zhang <[email protected]>
📝 WalkthroughWalkthroughIntroduces NVDataFlow integration for performance testing. A new nvdf.py module provides configuration assembly, GPU metadata collection, API operations (POST/GET with retry logic), and baseline data preparation. The test_perf.py module is updated to convert test configurations to NVDataFlow format, track per-test results, and upload data to the database after execution. Supporting changes include method signature updates and test list modifications. Changes
Sequence DiagramsequenceDiagram
participant Test as Test Execution
participant Utils as utils.run_ex()
participant PerfTest as PerfTest Instance
participant NVDF as nvdf Module
participant DB as NVDataFlow DB
Test->>Utils: run_ex(full_test_name, metric_type, ...)
Utils->>Utils: Execute benchmark & compute perf_result
Utils->>PerfTest: store_test_result(cmd_idx, metric_type, perf_result)
PerfTest->>PerfTest: _test_results[cmd_idx][metric_type] = perf_result
Utils->>Utils: _write_result()
Test->>PerfTest: upload_test_results_to_database()
PerfTest->>NVDF: prepare_baseline_data(new_data, model_groups, gpu_type)
NVDF->>NVDF: match/aggregate metrics from history
NVDF->>PerfTest: return baseline_data_dict
PerfTest->>NVDF: post_data(baseline_data_dict, new_data_dict, model_groups, gpu_type)
NVDF->>NVDF: per project: type_check_for_nvdf, _id()
NVDF->>DB: POST validated data with retry logic
DB->>NVDF: response
NVDF->>PerfTest: return
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/integration/defs/perf/utils.py (1)
509-520: Make metric_type Optional and avoid mutable default for outputs
run_exis called withmetric_type=Nonefor prepare-dataset; alsooutputs={}as default is unsafe.- def run_ex(self, - full_test_name: str, - metric_type: PerfMetricType, + def run_ex(self, + full_test_name: str, + metric_type: Optional[PerfMetricType], venv: Optional[PythonVenvRunnerImpl], @@ - output_dir: str, - outputs: Dict[int, str] = {}, + output_dir: str, + outputs: Optional[Dict[int, str]] = None, @@ - outputs = outputs.copy() + outputs = (outputs or {}).copy()tests/integration/defs/perf/test_perf.py (1)
2293-2305: Avoid KeyError when removing failed cmd resultsUse
pop(..., None)to safely discard failed entries.- del self._test_results[self._current_cmd_idx] + self._test_results.pop(self._current_cmd_idx, None)Also applies to: 2311-2312
🧹 Nitpick comments (7)
tests/integration/defs/perf/nvdf.py (5)
29-31: Avoid hard-coded NVDF endpoint; allow HTTPS and override via envMake NVDF_BASE_URL configurable (env) and prefer HTTPS for transport security.
-NVDF_BASE_URL = "http://gpuwa.nvidia.com" -PROJECT_ROOT = "sandbox-tmp-perf-test" +NVDF_BASE_URL = os.getenv("NVDF_BASE_URL", "https://gpuwa.nvidia.com") +PROJECT_ROOT = os.getenv("NVDF_PROJECT_ROOT", "sandbox-tmp-perf-test")
139-158: GPU name normalization comment vs behaviorComment says “Replace spaces with hyphens” but code removes spaces. Align both; hyphens are easier to read.
- # Replace spaces with hyphens - gpu_type = gpu_type.replace(" ", "") + # Replace spaces with hyphens for readability + gpu_type = gpu_type.replace(" ", "-")
315-339: GET with body is unusual; also add backoff jitterOpenSearch supports GET-with-body but some proxies don’t. Consider POST. Add sleep between retries for stability.
- while retry_time: - res = requests.get(url, data=json_data, headers=headers, timeout=10) + while retry_time: + res = requests.get(url, data=json_data, headers=headers, timeout=10) if res.status_code in [200, 201, 202]: return res @@ - retry_time -= 1 + retry_time -= 1 + time.sleep(min(60, 2 ** (5 - retry_time)))Optionally switch to
requests.post(url, ...)if infra allows.
507-560: Simplify baseline checks and guard non-numeric valuesSmall cleanups for clarity; also avoid including None in min/max.
- # Skip baseline data - if data.get("b_is_baseline") and data.get("b_is_baseline") == True: + # Skip baseline data + if data.get("b_is_baseline"): continue - if metric not in data: + if metric not in data or data[metric] is None: continue values.append(data.get(metric))
562-578: Simplify truthy checkMinor readability improvement.
- for data in history_data_list: - if data.get("b_is_baseline") and data.get("b_is_baseline") == True: + for data in history_data_list: + if data.get("b_is_baseline"): return datatests/integration/defs/perf/utils.py (1)
607-609: Guard store_test_result and provide base-class default to prevent AttributeErrorWhen extending
AbstractPerfScriptTestClass, not all subclasses may implementstore_test_result. Guard the call or add a no-op default in the base class.- # Store the test result - self.store_test_result(cmd_idx, metric_type, self._perf_result) + # Store the test result if supported + if metric_type is not None and hasattr(self, "store_test_result"): + self.store_test_result(cmd_idx, metric_type, self._perf_result)Optionally add to
AbstractPerfScriptTestClass:+ def store_test_result(self, cmd_idx: int, metric_type, perf_result: float) -> None: + """No-op default; subclasses may override to persist per-metric results.""" + returntests/integration/defs/perf/test_perf.py (1)
575-600: Use actual model name and stable typing in ServerConfig NVDF payload
- Prefer grouping by underlying
model_namerather than server config label for consistent project keys.l_moe_max_num_tokensshould be an int; default0avoids type-check failures.def to_nvdf_data(self) -> dict: """Convert ServerConfig to NVDataFlow data""" return { - "s_model_name": self.name, + "s_model_name": self.model_name, + "s_server_name": self.name, @@ - "l_moe_max_num_tokens": self.moe_max_num_tokens, + "l_moe_max_num_tokens": int(self.moe_max_num_tokens or 0),Note: Adding
s_server_nameis optional but helpful for tracing.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tests/integration/defs/perf/nvdf.py(1 hunks)tests/integration/defs/perf/test_perf.py(10 hunks)tests/integration/defs/perf/utils.py(2 hunks)tests/integration/test_lists/test-db/perf_sanity_l0_dgx_b200.yml(0 hunks)tests/integration/test_lists/test-db/perf_sanity_l0_dgx_b300.yml(0 hunks)
💤 Files with no reviewable changes (2)
- tests/integration/test_lists/test-db/perf_sanity_l0_dgx_b300.yml
- tests/integration/test_lists/test-db/perf_sanity_l0_dgx_b200.yml
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/nvdf.pytests/integration/defs/perf/test_perf.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/nvdf.pytests/integration/defs/perf/test_perf.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/nvdf.pytests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (3)
tests/integration/defs/perf/utils.py (1)
tests/integration/defs/perf/test_perf.py (1)
store_test_result(2213-2220)
tests/integration/defs/perf/nvdf.py (1)
tests/integration/defs/trt_test_alternative.py (2)
print_error(318-324)print_info(300-306)
tests/integration/defs/perf/test_perf.py (2)
tests/integration/defs/perf/nvdf.py (4)
_id(229-233)get_nvdf_config(90-122)post_data(341-361)prepare_baseline_data(425-473)tests/integration/defs/perf/utils.py (1)
PerfMetricType(87-117)
🪛 Ruff (0.14.1)
tests/integration/defs/perf/nvdf.py
91-91: Found useless expression. Either assign it to a variable or remove it.
(B018)
91-91: Undefined name gpu_type
(F821)
91-91: Undefined name gpu_count
(F821)
91-91: Undefined name build_id
(F821)
91-91: Undefined name build_url
(F821)
91-91: Undefined name job_name
(F821)
91-91: Undefined name job_id
(F821)
91-91: Undefined name job_url
(F821)
92-92: Found useless expression. Either assign it to a variable or remove it.
(B018)
92-92: Undefined name branch
(F821)
92-92: Undefined name commit
(F821)
92-92: Undefined name is_post_merge
(F821)
92-92: Undefined name is_pr_job
(F821)
92-92: Undefined name trigger_mr_user
(F821)
103-103: Undefined name gpu_type
(F821)
104-104: Undefined name gpu_count
(F821)
106-106: Undefined name host_node_name
(F821)
109-109: Undefined name build_id
(F821)
110-110: Undefined name build_url
(F821)
111-111: Undefined name job_name
(F821)
112-112: Undefined name job_id
(F821)
113-113: Undefined name job_url
(F821)
114-114: Undefined name branch
(F821)
115-115: Undefined name commit
(F821)
116-116: Undefined name is_post_merge
(F821)
117-117: Undefined name is_pr_job
(F821)
118-118: Undefined name trigger_mr_user
(F821)
145-145: Starting a process with a partial executable path
(S607)
160-160: Do not catch blind exception: Exception
(BLE001)
167-167: Do not use bare except
(E722)
300-300: Probable use of requests call without timeout
(S113)
359-359: Do not catch blind exception: Exception
(BLE001)
405-405: Loop control variable data overrides iterable it iterates
(B020)
420-420: Do not catch blind exception: Exception
(BLE001)
538-538: Avoid equality comparisons to True; use data.get("b_is_baseline"): for truth checks
Replace with data.get("b_is_baseline")
(E712)
551-551: Avoid equality comparisons to True; use data.get("b_is_baseline"): for truth checks
Replace with data.get("b_is_baseline")
(E712)
575-575: Avoid equality comparisons to True; use data.get("b_is_baseline"): for truth checks
Replace with data.get("b_is_baseline")
(E712)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/integration/defs/perf/nvdf.py (1)
287-313: HTTP POST without timeout/backoff jitter; misleading retry messageAdd
timeout, exponential backoff, and correct the retry count in the final log.
[raise_recommended_refactor]- retry_time = 5 - while retry_time: - res = requests.post(url, data=json_data, headers=headers) + retries = 5 + attempt = 0 + while attempt < retries: + res = requests.post(url, data=json_data, headers=headers, timeout=10) if res.status_code in [200, 201, 202]: @@ - retry_time -= 1 + attempt += 1 + time.sleep(min(60, 2 ** attempt)) - print_error( - f"Fail to post to {project} after {retry_time} retry: {url}, json: {json_data}, error: {res.text}" - ) + print_error( + f"Fail to post to {project} after {retries} retries: {url}, json: {json_data}, error: {getattr(res, 'text', '<no response>')}" + )tests/integration/defs/perf/test_perf.py (2)
680-690: ClientConfig NVDF payload OK; relies on 'd_' numeric support in type checkerNo change needed after expanding
type_check_for_nvdfto acceptd_*.
1453-1455: Result aggregation shape looks goodStoring results by
cmd_idxandPerfMetricTypeis consistent with later upload usage.Also applies to: 2213-2221
tests/integration/defs/perf/utils.py (1)
2254-2264: Review comment is inconsistent with actual code stateThe review comment assumes
metric_typehas been changed toOptional[PerfMetricType], but the function definition atutils.py:509-519shows it remainsmetric_type: PerfMetricType(not Optional). The call site attest_perf.py:2255passesNone, which violates this non-Optional type contract.Either the function parameter type annotation needs to be updated to
Optional[PerfMetricType], or the call site needs to pass a validPerfMetricTypevalue instead ofNone.Likely an incorrect or invalid review comment.
Signed-off-by: Chenfei Zhang <[email protected]>
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22491 [ run ] triggered by Bot. Commit: |
|
PR_Github #22491 [ run ] completed with state |
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22495 [ run ] triggered by Bot. Commit: |
|
PR_Github #22495 [ run ] completed with state |
Signed-off-by: Chenfei Zhang <[email protected]>
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22498 [ run ] triggered by Bot. Commit: |
|
PR_Github #22498 [ run ] completed with state |
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22502 [ run ] triggered by Bot. Commit: |
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22503 [ run ] triggered by Bot. Commit: |
|
PR_Github #22502 [ run ] completed with state |
|
PR_Github #22503 [ run ] completed with state |
Signed-off-by: Chenfei Zhang <[email protected]>
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #22524 [ run ] triggered by Bot. Commit: |
|
PR_Github #22524 [ run ] completed with state |
Description
This PR supports uploading pytest's perf results (only server-client benchmark) to OpenSearch database.
OpenSearch Database: https://gpuwa.nvidia.com/os-dashboards/app/data-explorer/discover#/view/8b942dd0-b166-11f0-990f-f7c929993cf6?_q=(filters:!(),query:(language:lucene,query:''))&_a=(discover:(columns:!(_source),isDirty:!f,savedSearch:'8b942dd0-b166-11f0-990f-f7c929993cf6',sort:!()),metadata:(indexPattern:'54473a20-b166-11f0-990f-f7c929993cf6',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-6M,to:now))
OpenSearch Dashboar: https://gpuwa.nvidia.com/os-dashboards/app/dashboards#/view/a3f0e760-b166-11f0-990f-f7c929993cf6?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15d,to:now))&_a=(description:'',filters:!(),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),panels:!((embeddableConfig:(),gridData:(h:15,i:bf15b8e3-2ad6-486e-a141-789c02dbfb78,w:24,x:0,y:0),id:'16a64a70-b167-11f0-990f-f7c929993cf6',panelIndex:bf15b8e3-2ad6-486e-a141-789c02dbfb78,type:visualization,version:'2.15.0'),(embeddableConfig:(),gridData:(h:15,i:'12d009b4-ca34-4da5-9956-e49fe811a048',w:24,x:24,y:0),id:'70ab2ea0-b167-11f0-990f-f7c929993cf6',panelIndex:'12d009b4-ca34-4da5-9956-e49fe811a048',type:visualization,version:'2.15.0')),query:(language:kuery,query:''),timeRestore:!f,title:df-sandbox-temp-1-perf-test-nvidiagb200-deepseek_r1_0528_fp4,viewMode:edit)
Summary by CodeRabbit
New Features
Tests