-
Notifications
You must be signed in to change notification settings - Fork 187
Add functional test cases for published checkpoints on HF #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: neoyy-mino <[email protected]>
Signed-off-by: neoyy-mino <[email protected]>
WalkthroughAdds a test utility module for multi-backend model deployment and two new test suites: a GPT-OSS QAT end-to-end pipeline and parametric NVIDIA model deployment tests with HF cache cleanup and per-test environment teardown. Includes runtime/hardware checks, backend-specific deployment logic, and generated deployer configurations. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Function
participant MDT as ModelDeployerList
participant MD as ModelDeployer
participant Backend as Backend (trtllm / vllm / sglang)
participant GPU as GPU / nvidia-smi
Test->>MDT: instantiate with params
MDT->>MDT: generate Cartesian product of configs
MDT-->>Test: yield ModelDeployer instances
loop for each ModelDeployer
Test->>MD: call run()
MD->>MD: validate CUDA / SM / GPU count
alt requirements not met
MD-->>Test: skip test (pytest.skip)
else
MD->>Backend: import & initialize backend client
Backend-->>MD: ready
MD->>Backend: send COMMON_PROMPTS for inference
Backend-->>MD: inference outputs
MD->>GPU: run nvidia-smi
GPU-->>MD: status output
MD-->>Test: finish
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (7)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)
66-67: Isolate distributed ports to avoid CI collisions.accelerate launch can clash on MASTER_PORT in parallel CI. Use the helper’s port setup.
- run_example_command(cmd_parts, "gpt-oss") + run_example_command(cmd_parts, "gpt-oss", setup_free_port=True) @@ - run_example_command(cmd_parts, "gpt-oss") + run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)Also applies to: 115-116
202-208: Consider marking this full E2E pipeline as slow/optional.This test downloads models and runs training/benchmarks; it’s better opt-in (env marker) to keep CI green.
Example:
pytestmark = pytest.mark.skipif( os.getenv("RUN_GPT_OSS_E2E") != "1", reason="Set RUN_GPT_OSS_E2E=1 to run heavy E2E" )tests/examples/llm_ptq/test_deploy.py (3)
33-48: Tighten HF cache cleanup pattern and add safety.Limit deletions to HuggingFace model dirs to avoid accidental removals of unrelated paths containing “nvidia”.
- for item in os.listdir(HF_CACHE_PATH): + for item in os.listdir(HF_CACHE_PATH): item_path = os.path.join(HF_CACHE_PATH, item) - if os.path.isdir(item_path) and "nvidia" in item: + if ( + os.path.isdir(item_path) + and (item.startswith("models--nvidia--") or item.startswith("datasets--nvidia--")) + ): shutil.rmtree(item_path, ignore_errors=True) print(f"✓ Removed: {item}")
50-55: Run cache cleanup once per session to cut test time.Per-test rmtree can add minutes across this matrix. Prefer session-scope, and also clear before the session starts.
-@pytest.fixture(autouse=True) -def cleanup_after_test(): - """Automatically clean up after each test.""" - yield # Run the test - clear_hf_cache() # Clean up after test completes +@pytest.fixture(scope="session", autouse=True) +def cleanup_hf_cache_once(): + """Clean HF cache before and after the test session.""" + clear_hf_cache() + yield + clear_hf_cache()
57-96: Gate the huge test matrix behind env toggles to protect CI time.The Cartesian products across backends/models easily create dozens of heavy tests. Add an env/mark gate or shrink defaults to smoke coverage.
Example:
if os.getenv("RUN_HF_DEPLOY_TESTS") != "1": pytest.skip("Set RUN_HF_DEPLOY_TESTS=1 to enable HF deployment tests", allow_module_level=True)Also applies to: 97-171, 173-244, 246-263, 265-301, 303-336, 338-352, 354-381, 383-434
tests/_test_utils/deploy_utils.py (2)
80-86: Label says “Before Test” but it’s printed after running.Either move it earlier or update the label.
- print("\n=== GPU Status Before Test ===") + print("\n=== GPU Status After Test ===")
174-191: Add a minimal assertion for SGLang output.Ensure the backend produces a non-empty text.
- print(llm.generate(["What's the age of the earth? "])) + out = llm.generate(["What's the age of the earth? "]) + assert isinstance(out, list) and len(out) > 0, "SGLang returned empty output" + print(out) llm.shutdown()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/_test_utils/deploy_utils.py(1 hunks)tests/examples/gpt_oss/test_gpt_oss_qat.py(1 hunks)tests/examples/llm_ptq/test_deploy.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)
tests/_test_utils/examples/run_command.py (1)
run_example_command(35-43)tests/_test_utils/torch_misc.py (1)
minimum_gpu(51-55)
tests/examples/llm_ptq/test_deploy.py (1)
tests/_test_utils/deploy_utils.py (2)
ModelDeployerList(193-225)run(59-86)
tests/_test_utils/deploy_utils.py (1)
modelopt/deploy/llm/generate.py (1)
LLM(53-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (1)
98-115: No issues found; review comment is incorrect.The output confirms that
sft.pyis the correct QAT entrypoint. The examples/gpt-oss/README.md explicitly documents "A real end-to-end example for this is insft.py" for QAT training. The README shows the exact command structure used in the test:sft.pywith--quant_cfg MXFP4_MLP_WEIGHT_ONLY_CFG. No separate QAT script exists; QAT is performed through the SFT trainer with quantization arguments passed via the parser. The test code correctly follows the documented pattern.Likely an incorrect or invalid review comment.
| def _deploy_vllm(self): | ||
| """Deploy a model using vLLM.""" | ||
| try: | ||
| from vllm import LLM, SamplingParams | ||
| except ImportError: | ||
| pytest.skip("vllm package not available") | ||
|
|
||
| quantization_method = "modelopt" | ||
| if "FP4" in self.model_id: | ||
| quantization_method = "modelopt_fp4" | ||
| llm = LLM( | ||
| model=self.model_id, | ||
| quantization=quantization_method, | ||
| tensor_parallel_size=self.tensor_parallel_size, | ||
| trust_remote_code=True, | ||
| ) | ||
| sampling_params = SamplingParams(temperature=0.8, top_p=0.9) | ||
| outputs = llm.generate(COMMON_PROMPTS, sampling_params) | ||
|
|
||
| # Assertions and output | ||
| assert len(outputs) == len(COMMON_PROMPTS), ( | ||
| f"Expected {len(COMMON_PROMPTS)} outputs, got {len(outputs)}" | ||
| ) | ||
|
|
||
| for i, output in enumerate(outputs): | ||
| assert output.prompt == COMMON_PROMPTS[i], f"Prompt mismatch at index {i}" | ||
| assert hasattr(output, "outputs"), f"Output {i} missing 'outputs' attribute" | ||
| assert len(output.outputs) > 0, f"Output {i} has no generated text" | ||
| assert hasattr(output.outputs[0], "text"), f"Output {i} missing 'text' attribute" | ||
| assert isinstance(output.outputs[0].text, str), f"Output {i} text is not a string" | ||
| assert len(output.outputs[0].text) > 0, f"Output {i} generated empty text" | ||
|
|
||
| print(f"Model: {self.model_id}") | ||
| print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}") | ||
| print("-" * 50) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
vLLM quantization string may vary by version; confirm.
Some versions expect "modelopt" vs "modelopt_fp4" or different aliases. Please confirm the CI vLLM version supports these strings.
🌐 Web query:
What quantization strings does vLLM accept for NVIDIA Model Optimizer (ModelOpt) FP8/FP4 checkpoints?
💡 Result:
vLLM accepts the following ModelOpt quantization strings in the hf_quant_config.json "quant_algo" / "kv_cache_quant_algo" fields:
- "FP8" (for FP8 checkpoints; also allowed for kv_cache_quant_algo). [1]
- "NVFP4" (for ModelOpt NVFP4/FP4 checkpoints). [1][2]
Incorrect quantization string identifiers; use "FP8" and "NVFP4" instead.
vLLM accepts "FP8" for FP8 checkpoints and "NVFP4" for ModelOpt NVFP4/FP4 checkpoints (specified in hf_quant_config.json "quant_algo" fields). The code currently uses "modelopt" and "modelopt_fp4", which are not recognized by vLLM and will fail at runtime. Update lines 141–142 to use the correct identifiers:
- Replace
"modelopt"with"FP8" - Replace
"modelopt_fp4"with"NVFP4"
🤖 Prompt for AI Agents
In tests/_test_utils/deploy_utils.py around lines 138 to 173, the quantization
string identifiers are incorrect: replace the current defaults so that
quantization_method is "FP8" instead of "modelopt", and when the model_id
contains "FP4" set quantization_method to "NVFP4" instead of "modelopt_fp4";
update the two assignment literals accordingly so vLLM recognizes the
quantization algorithm.
| def deploy_gpt_oss_trtllm(self, tmp_path): | ||
| """Deploy GPT-OSS model with TensorRT-LLM.""" | ||
| # Prepare benchmark data | ||
| tensorrt_llm_workspace = "/app/tensorrt_llm" | ||
| script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py") | ||
| model_name = self.model_path.split("/")[-1] | ||
| benchmark_file = f"{model_name}_synthetic_128_128.txt" | ||
|
|
||
| if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0: | ||
| print(f"Creating dataset file '{benchmark_file}'...") | ||
| with open(benchmark_file, "w") as fp: | ||
| subprocess.run( | ||
| f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \ | ||
| --output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400", | ||
| shell=True, | ||
| check=True, | ||
| stdout=fp, | ||
| ) | ||
| else: | ||
| print(f"Dataset file '{benchmark_file}' already exists.") | ||
|
|
||
| assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist" | ||
|
|
||
| cmd_parts = [ | ||
| "trtllm-bench", | ||
| "--model", | ||
| self.model_path, | ||
| "throughput", | ||
| "--backend", | ||
| "pytorch", | ||
| "--dataset", | ||
| benchmark_file, | ||
| "--kv_cache_free_gpu_mem_fraction", | ||
| "0.9", | ||
| "--report_json", | ||
| str(tmp_path / "low_latency_throughput.json"), | ||
| ] | ||
| run_example_command(cmd_parts, "gpt-oss") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make dataset generation and bench robust (path, existence, tool checks).
- Hard-coded "/app/tensorrt_llm" may not exist.
- Dataset file is created in the repo root but consumed from examples/gpt-oss (cwd mismatch).
- Missing checks for prepare_dataset.py and trtllm-bench binaries.
@@
- tensorrt_llm_workspace = "/app/tensorrt_llm"
- script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py")
+ import shutil
+ script = os.getenv(
+ "TLLM_PREPARE_DATASET",
+ os.path.join(os.getenv("TLLM_WORKSPACE", ""), "benchmarks", "cpp", "prepare_dataset.py"),
+ )
@@
- model_name = self.model_path.split("/")[-1]
- benchmark_file = f"{model_name}_synthetic_128_128.txt"
+ model_name = self.model_path.split("/")[-1]
+ benchmark_file = tmp_path / f"{model_name}_synthetic_128_128.txt"
@@
- if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
+ if not os.path.exists(script) or not os.path.isfile(script):
+ pytest.skip(f"prepare_dataset.py not found at: {script}")
+ if shutil.which("trtllm-bench") is None:
+ pytest.skip("trtllm-bench not found in PATH")
+ if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
@@
- with open(benchmark_file, "w") as fp:
+ with open(benchmark_file, "w") as fp:
subprocess.run(
- f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
+ f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
--output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400",
shell=True,
check=True,
stdout=fp,
)
@@
- assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist"
+ assert os.path.isfile(benchmark_file) and os.path.getsize(benchmark_file) > 0, (
+ f"Benchmark file '{benchmark_file}' missing or empty"
+ )
@@
- "--dataset",
- benchmark_file,
+ "--dataset",
+ str(benchmark_file),🤖 Prompt for AI Agents
tests/examples/gpt_oss/test_gpt_oss_qat.py around lines 162 to 199: the
deploy_gpt_oss_trtllm function assumes hard-coded workspace and writes/reads
dataset in repo root and does not verify required tools; change it to (1)
compute tensorrt_llm_workspace relative to the test file or an env var and
verify the directory exists, (2) build absolute paths for prepare_dataset.py and
the benchmark file inside the test tmp_path (or examples/gpt-oss working dir) so
creation and consumption use the same cwd, (3) check that prepare_dataset.py is
present and executable and that trtllm-bench is available on PATH before
running, and (4) run subprocesses with explicit cwd and capture/check return
codes, raising clear assertion errors if files/tools are missing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #455 +/- ##
=======================================
Coverage 73.40% 73.40%
=======================================
Files 180 180
Lines 18047 18047
=======================================
Hits 13248 13248
Misses 4799 4799 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: noeyy-mino <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/_test_utils/deploy_utils.py (1)
147-149: Incorrect quantization string identifiers; use "FP8" and "NVFP4".This critical issue from the previous review remains unaddressed. vLLM expects
"FP8"for FP8 checkpoints and"NVFP4"for ModelOpt FP4 checkpoints. The current values"modelopt"and"modelopt_fp4"will fail at runtime.Apply this diff:
- quantization_method = "modelopt" + quantization_method = "FP8" if "FP4" in self.model_id: - quantization_method = "modelopt_fp4" + quantization_method = "NVFP4"
🧹 Nitpick comments (5)
tests/_test_utils/deploy_utils.py (5)
42-50: Complete the docstring with all parameters.The docstring only documents 4 of the 7 parameters. Please add documentation for
attn_backend,base_model, andeagle3_one_modelto help users understand their purpose and expected values.
83-88: Misleading comment and timing of GPU status check.The comment states "Before Test" (line 86), but this GPU status check executes after the deployment method has already run. If the intent is to capture GPU state for debugging failures, consider either:
- Moving this check before line 74 (before backend dispatch) and updating the comment
- Updating the comment to "After Test" if post-deployment state is the intent
Additionally, the
|| truein the command (line 84) suppresses failures silently, which may hide useful diagnostic information.
132-138: Add output validation assertions for consistency.Unlike
_deploy_vllm()(lines 160-170), this method only prints outputs without validating them. Consider adding assertions to verify:
- Number of outputs matches number of prompts
- Each output has the expected structure
- Generated text is non-empty
This improves test reliability and maintains consistency across deployment methods.
191-191: Use COMMON_PROMPTS instead of hardcoded prompt.This method uses a hardcoded prompt while other deployment methods use the
COMMON_PROMPTSconstant defined at line 23. For consistency and maintainability, consider usingCOMMON_PROMPTShere as well.Apply this diff:
- print(llm.generate(["What's the age of the earth? "])) + outputs = llm.generate(COMMON_PROMPTS) + print(outputs)
207-212: Consider addingtest_idto ModelDeployer.init.The code dynamically assigns a
test_idattribute to deployer instances (line 211), but this attribute is not defined inModelDeployer.__init__. For clarity and to avoid confusion, consider either:
- Adding
test_idas a parameter inModelDeployer.__init__with a default value- Adding a docstring comment explaining that this attribute is set externally by
ModelDeployerList
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/_test_utils/deploy_utils.py(1 hunks)
🔇 Additional comments (1)
tests/_test_utils/deploy_utils.py (1)
182-184: Incorrect quantization identifiers for SGLang—fix differs from suggested diff.SGLang accepts "fp8" for FP8 checkpoints and "modelopt_fp4" for ModelOpt NV-FP4 checkpoints. The current code uses "modelopt" on line 182, which is incorrect.
The line 182 value should be changed to "fp8", but line 184 is already correct as "modelopt_fp4" and should not be changed to "NVFP4".
Apply this corrected diff:
- quantization_method = "modelopt" + quantization_method = "fp8" if "FP4" in self.model_id: quantization_method = "modelopt_fp4"Likely an incorrect or invalid review comment.
Signed-off-by: noeyy-mino <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (1)
165-202: Fix dataset/bench pathing and tool checks; current code will break (cwd mismatch, hard-coded workspace, no CLI checks).
- Writes the dataset in the test cwd but consumes it from examples/gpt-oss (via
run_example_command), so--datasetwon’t be found.- Hard-codes
/app/tensorrt_llm; may not exist.- No checks for
prepare_dataset.pyortrtllm-bench.- Uses
shell=Truewith an interpolated path.This was flagged earlier and persists; please harden as below.
def deploy_gpt_oss_trtllm(self, tmp_path): """Deploy GPT-OSS model with TensorRT-LLM.""" - # Prepare benchmark data - tensorrt_llm_workspace = "/app/tensorrt_llm" - script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py") - model_name = self.model_path.split("/")[-1] - benchmark_file = f"{model_name}_synthetic_128_128.txt" + # Prepare benchmark data + script = os.getenv( + "TLLM_PREPARE_DATASET", + os.path.join(os.getenv("TLLM_WORKSPACE", ""), "benchmarks", "cpp", "prepare_dataset.py"), + ) + model_name = self.model_path.split("/")[-1] + benchmark_file = tmp_path / f"{model_name}_synthetic_128_128.txt" - if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0: - print(f"Creating dataset file '{benchmark_file}'...") - with open(benchmark_file, "w") as fp: - subprocess.run( - f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \ - --output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400", - shell=True, - check=True, - stdout=fp, - ) + # Preflight checks + if not script or not os.path.isfile(script): + pytest.skip(f"prepare_dataset.py not found at: {script}") + if shutil.which("trtllm-bench") is None: + pytest.skip("trtllm-bench not found in PATH") + + if not benchmark_file.exists() or benchmark_file.stat().st_size == 0: + print(f"Creating dataset file '{benchmark_file}'...") + with open(benchmark_file, "w") as fp: + subprocess.run( + [ + sys.executable, + script, + "--stdout", + f"--tokenizer={self.model_path}", + "token-norm-dist", + "--input-mean", + "128", + "--output-mean", + "128", + "--input-stdev", + "0", + "--output-stdev", + "0", + "--num-requests", + "1400", + ], + check=True, + stdout=fp, + ) else: print(f"Dataset file '{benchmark_file}' already exists.") - assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist" + assert benchmark_file.is_file() and benchmark_file.stat().st_size > 0, ( + f"Benchmark file '{benchmark_file}' missing or empty" + ) cmd_parts = [ "trtllm-bench", "--model", self.model_path, "throughput", "--backend", "pytorch", "--dataset", - benchmark_file, + str(benchmark_file), "--kv_cache_free_gpu_mem_fraction", "0.9", "--report_json", str(tmp_path / "low_latency_throughput.json"), ] run_example_command(cmd_parts, "gpt-oss")#!/bin/bash # Quick preflight: verify required tools and script location. set -euo pipefail echo "accelerate: $(command -v accelerate || echo 'MISSING')" echo "trtllm-bench: $(command -v trtllm-bench || echo 'MISSING')" echo "TLLM_WORKSPACE=${TLLM_WORKSPACE:-}" # Try to find prepare_dataset.py under TLLM_WORKSPACE or repo if [[ -n "${TLLM_WORKSPACE:-}" ]]; then test -f "${TLLM_WORKSPACE}/benchmarks/cpp/prepare_dataset.py" && echo "prepare_dataset.py found in TLLM_WORKSPACE" || echo "prepare_dataset.py NOT found under TLLM_WORKSPACE" else fd -H -a -t f prepare_dataset.py 2>/dev/null || true fi
🧹 Nitpick comments (6)
tests/examples/gpt_oss/test_gpt_oss_qat.py (6)
25-41: Docstring is stale (hard-codes 20B); make it model-agnostic.The implementation is parameterized by
model_path, but the docstring still refers to 20B and fixed input/output names.class GPTOSS: - """Test GPT-OSS-20B QAT (Quantization-Aware Training) pipeline. - - This test suite covers the complete GPT-OSS-20B optimization pipeline: - - Step 1: test_gpt_oss_sft_training - Supervised Fine-Tuning (SFT) - Input: openai/gpt-oss-20b - Output: gpt-oss-20b-sft - - Step 2: test_gpt_oss_qat_training - Quantization-Aware Training (QAT) - Input: gpt-oss-20b-sft (from Step 1) - Output: gpt-oss-20b-qat - - Step 3: test_gpt_oss_mxfp4_conversion - MXFP4 Weight-Only Conversion - Input: gpt-oss-20b-qat (from Step 2) - Output: gpt-oss-20b-qat-real-mxfp4 - - Each step can be run independently (with mock inputs) or as part of the full pipeline. - """ + """Test GPT‑OSS QAT (Quantization‑Aware Training) pipeline. + + Pipeline: + 1) SFT training + Input: {model_path} + Output: <model_name>-sft + 2) QAT training (MXFP4 weight‑only) + Input: <model_name>-sft + Output: <model_name>-qat + 3) MXFP4 conversion + Input: <model_name>-qat + Output: <model_name>-qat-real-mxfp4 + + Each step can run independently (creates minimal mock inputs if prior outputs are missing) or as part of the full pipeline. + """
16-22: Add missing imports used by the proposed fixes.We’ll need
sys(forsys.executable) andshutil(for CLI checks).import os import subprocess +import shutil +import sys import pytest from _test_utils.examples.run_command import run_example_command from _test_utils.torch_misc import minimum_gpu
67-67: Avoid port collisions: allocate a free MASTER_PORT for accelerate.Enable
setup_free_portso parallel tests don’t fight over the default port.- run_example_command(cmd_parts, "gpt-oss") + run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)- run_example_command(cmd_parts, "gpt-oss") + run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)Also applies to: 117-117
151-158: Use the current Python interpreter instead of hardcoded 'python'.Prevents environment mismatches (e.g., venvs).
- cmd_parts = [ - "python", + cmd_parts = [ + sys.executable, "convert_oai_mxfp4_weight_only.py", "--model_path", str(qat_dir), "--output_path", str(conversion_output_dir), ]
205-211: Gate the heavy end-to-end test (optional).To keep CI fast and predictable, consider marking as slow or guarding by an env var.
-@pytest.mark.parametrize( +@pytest.mark.parametrize( "model_path", [ pytest.param("openai/gpt-oss-20b", id="gpt-oss-20b", marks=minimum_gpu(2)), pytest.param("openai/gpt-oss-120b", id="gpt-oss-120b", marks=minimum_gpu(8)), ], ) +@pytest.mark.skipif(os.getenv("RUN_GPT_OSS_E2E") != "1", reason="Set RUN_GPT_OSS_E2E=1 to run full GPT‑OSS pipeline")
165-202: Optional: reduce dataset size or make it configurable for CI.1400 requests can be heavy; allow overriding via env for quicker functional runs.
- "--num-requests", - "1400", + "--num-requests", + os.getenv("GPT_OSS_NUM_REQUESTS", "200"),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/examples/gpt_oss/test_gpt_oss_qat.py(1 hunks)tests/examples/llm_ptq/test_deploy.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/examples/llm_ptq/test_deploy.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)
tests/_test_utils/examples/run_command.py (1)
run_example_command(35-43)tests/_test_utils/torch_misc.py (1)
minimum_gpu(51-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
|
You need to sign your commits with an SSH key. Please take a look at https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md#%EF%B8%8F-signing-your-work |
|
@noeyy-mino can any of these tests be also made part of nightly modelopt cicd? If we can use a dummy small model (e.g. 2 layers, smaller hidden size, etc.) and it can run in few minutes then we can enable these tests nightly to catch bugs sooner. |
|
No, they are for QA release testing. |
| HF_CACHE_PATH = os.getenv("HF_HUB_CACHE", os.path.expanduser("~/.cache/huggingface/hub")) | ||
|
|
||
|
|
||
| def clear_hf_cache(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cache cleanup for saving space after tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is in tests/examples/llm_ptq directory, it will be automatically run when nightly cicd runs llm_ptq tests. Can you add some marker here so its skipped nightly or only run during QA cycle?
Signed-off-by: noeyy-mino <[email protected]>
What does this PR do?
Type of change: new tests
Overview:
Add vLLM/SGLang/TRT LLM deployment tests for published checkpoints on HF.
Add e2e test case for gpt-oss
Usage
# pytest test_deploy.py -k "vllm"Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit