Skip to content

Conversation

@noeyy-mino
Copy link

@noeyy-mino noeyy-mino commented Oct 21, 2025

What does this PR do?

Type of change: new tests

Overview:
Add vLLM/SGLang/TRT LLM deployment tests for published checkpoints on HF.
Add e2e test case for gpt-oss

Usage

# pytest test_deploy.py -k "vllm"

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • Tests
    • Added ModelDeployer and ModelDeployerList test utilities with COMMON_PROMPTS to run multi-backend deployment and inference scenarios.
    • Introduced an end-to-end GPT-OSS QAT pipeline test (SFT → QAT → MXFP4 conversion) with optional deployment/benchmarking.
    • Added a broad parametric LLM deployment test suite covering many models/backends with readable test IDs.
    • Added automatic Hugging Face cache cleanup after tests to keep runs isolated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Adds a test utility module for multi-backend model deployment and two new test suites: a GPT-OSS QAT end-to-end pipeline and parametric NVIDIA model deployment tests with HF cache cleanup and per-test environment teardown. Includes runtime/hardware checks, backend-specific deployment logic, and generated deployer configurations.

Changes

Cohort / File(s) Summary
Test Deployment Utilities
tests/_test_utils/deploy_utils.py
New module introducing ModelDeployer (backends: trtllm, vllm, sglang) with initialization params, run() orchestration (CUDA/SM/GPU checks, runtime skips), backend-specific _deploy_* methods, ModelDeployerList to produce Cartesian-product deployer configurations, and COMMON_PROMPTS.
GPT-OSS QAT Test Suite
tests/examples/gpt_oss/test_gpt_oss_qat.py
New test module adding GPTOSS class implementing SFT training, QAT training, MXFP4 conversion, and a deploy_gpt_oss_trtllm step. Adds parameterized test_gpt_oss_complete_pipeline for openai/gpt-oss-20b and openai/gpt-oss-120b, with directory assertions and mock/dataset fallbacks.
NVIDIA Model Deployment Tests
tests/examples/llm_ptq/test_deploy.py
New test suite using ModelDeployerList to run deployments across many models/backends. Adds clear_hf_cache() to remove HF cache entries, an autouse cleanup_after_test fixture, HF_CACHE_PATH constant, readable test id helper, and many parametric deployment tests.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Function
    participant MDT as ModelDeployerList
    participant MD as ModelDeployer
    participant Backend as Backend (trtllm / vllm / sglang)
    participant GPU as GPU / nvidia-smi

    Test->>MDT: instantiate with params
    MDT->>MDT: generate Cartesian product of configs
    MDT-->>Test: yield ModelDeployer instances

    loop for each ModelDeployer
        Test->>MD: call run()
        MD->>MD: validate CUDA / SM / GPU count
        alt requirements not met
            MD-->>Test: skip test (pytest.skip)
        else
            MD->>Backend: import & initialize backend client
            Backend-->>MD: ready
            MD->>Backend: send COMMON_PROMPTS for inference
            Backend-->>MD: inference outputs
            MD->>GPU: run nvidia-smi
            GPU-->>MD: status output
            MD-->>Test: finish
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through prompts and GPU steam,

rolled deployers into one neat scheme.
vllm, trtllm, sglang on the line,
caches cleared — the tests align.
A joyful rabbit — CI will beam!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add functional test cases for published checkpoints on HF" directly and accurately reflects the main changes in this changeset. The PR introduces three new test files (deploy_utils.py, test_gpt_oss_qat.py, and test_deploy.py) that collectively implement functional test cases for multi-backend model deployment scenarios using published checkpoints from Hugging Face. The title is concise, clear, and specific enough that a teammate scanning the commit history would immediately understand the primary purpose of the change without requiring additional context.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)

66-67: Isolate distributed ports to avoid CI collisions.

accelerate launch can clash on MASTER_PORT in parallel CI. Use the helper’s port setup.

-        run_example_command(cmd_parts, "gpt-oss")
+        run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)
@@
-        run_example_command(cmd_parts, "gpt-oss")
+        run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)

Also applies to: 115-116


202-208: Consider marking this full E2E pipeline as slow/optional.

This test downloads models and runs training/benchmarks; it’s better opt-in (env marker) to keep CI green.

Example:

pytestmark = pytest.mark.skipif(
    os.getenv("RUN_GPT_OSS_E2E") != "1", reason="Set RUN_GPT_OSS_E2E=1 to run heavy E2E"
)
tests/examples/llm_ptq/test_deploy.py (3)

33-48: Tighten HF cache cleanup pattern and add safety.

Limit deletions to HuggingFace model dirs to avoid accidental removals of unrelated paths containing “nvidia”.

-            for item in os.listdir(HF_CACHE_PATH):
+            for item in os.listdir(HF_CACHE_PATH):
                 item_path = os.path.join(HF_CACHE_PATH, item)
-                if os.path.isdir(item_path) and "nvidia" in item:
+                if (
+                    os.path.isdir(item_path)
+                    and (item.startswith("models--nvidia--") or item.startswith("datasets--nvidia--"))
+                ):
                     shutil.rmtree(item_path, ignore_errors=True)
                     print(f"✓ Removed: {item}")

50-55: Run cache cleanup once per session to cut test time.

Per-test rmtree can add minutes across this matrix. Prefer session-scope, and also clear before the session starts.

-@pytest.fixture(autouse=True)
-def cleanup_after_test():
-    """Automatically clean up after each test."""
-    yield  # Run the test
-    clear_hf_cache()  # Clean up after test completes
+@pytest.fixture(scope="session", autouse=True)
+def cleanup_hf_cache_once():
+    """Clean HF cache before and after the test session."""
+    clear_hf_cache()
+    yield
+    clear_hf_cache()

57-96: Gate the huge test matrix behind env toggles to protect CI time.

The Cartesian products across backends/models easily create dozens of heavy tests. Add an env/mark gate or shrink defaults to smoke coverage.

Example:

if os.getenv("RUN_HF_DEPLOY_TESTS") != "1":
    pytest.skip("Set RUN_HF_DEPLOY_TESTS=1 to enable HF deployment tests", allow_module_level=True)

Also applies to: 97-171, 173-244, 246-263, 265-301, 303-336, 338-352, 354-381, 383-434

tests/_test_utils/deploy_utils.py (2)

80-86: Label says “Before Test” but it’s printed after running.

Either move it earlier or update the label.

-        print("\n=== GPU Status Before Test ===")
+        print("\n=== GPU Status After Test ===")

174-191: Add a minimal assertion for SGLang output.

Ensure the backend produces a non-empty text.

-        print(llm.generate(["What's the age of the earth? "]))
+        out = llm.generate(["What's the age of the earth? "])
+        assert isinstance(out, list) and len(out) > 0, "SGLang returned empty output"
+        print(out)
         llm.shutdown()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4476f21 and 4c8261a.

📒 Files selected for processing (3)
  • tests/_test_utils/deploy_utils.py (1 hunks)
  • tests/examples/gpt_oss/test_gpt_oss_qat.py (1 hunks)
  • tests/examples/llm_ptq/test_deploy.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)
tests/_test_utils/examples/run_command.py (1)
  • run_example_command (35-43)
tests/_test_utils/torch_misc.py (1)
  • minimum_gpu (51-55)
tests/examples/llm_ptq/test_deploy.py (1)
tests/_test_utils/deploy_utils.py (2)
  • ModelDeployerList (193-225)
  • run (59-86)
tests/_test_utils/deploy_utils.py (1)
modelopt/deploy/llm/generate.py (1)
  • LLM (53-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (1)

98-115: No issues found; review comment is incorrect.

The output confirms that sft.py is the correct QAT entrypoint. The examples/gpt-oss/README.md explicitly documents "A real end-to-end example for this is in sft.py" for QAT training. The README shows the exact command structure used in the test: sft.py with --quant_cfg MXFP4_MLP_WEIGHT_ONLY_CFG. No separate QAT script exists; QAT is performed through the SFT trainer with quantization arguments passed via the parser. The test code correctly follows the documented pattern.

Likely an incorrect or invalid review comment.

Comment on lines +138 to +173
def _deploy_vllm(self):
"""Deploy a model using vLLM."""
try:
from vllm import LLM, SamplingParams
except ImportError:
pytest.skip("vllm package not available")

quantization_method = "modelopt"
if "FP4" in self.model_id:
quantization_method = "modelopt_fp4"
llm = LLM(
model=self.model_id,
quantization=quantization_method,
tensor_parallel_size=self.tensor_parallel_size,
trust_remote_code=True,
)
sampling_params = SamplingParams(temperature=0.8, top_p=0.9)
outputs = llm.generate(COMMON_PROMPTS, sampling_params)

# Assertions and output
assert len(outputs) == len(COMMON_PROMPTS), (
f"Expected {len(COMMON_PROMPTS)} outputs, got {len(outputs)}"
)

for i, output in enumerate(outputs):
assert output.prompt == COMMON_PROMPTS[i], f"Prompt mismatch at index {i}"
assert hasattr(output, "outputs"), f"Output {i} missing 'outputs' attribute"
assert len(output.outputs) > 0, f"Output {i} has no generated text"
assert hasattr(output.outputs[0], "text"), f"Output {i} missing 'text' attribute"
assert isinstance(output.outputs[0].text, str), f"Output {i} text is not a string"
assert len(output.outputs[0].text) > 0, f"Output {i} generated empty text"

print(f"Model: {self.model_id}")
print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")
print("-" * 50)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

vLLM quantization string may vary by version; confirm.

Some versions expect "modelopt" vs "modelopt_fp4" or different aliases. Please confirm the CI vLLM version supports these strings.


🌐 Web query:

What quantization strings does vLLM accept for NVIDIA Model Optimizer (ModelOpt) FP8/FP4 checkpoints?

💡 Result:

vLLM accepts the following ModelOpt quantization strings in the hf_quant_config.json "quant_algo" / "kv_cache_quant_algo" fields:

  • "FP8" (for FP8 checkpoints; also allowed for kv_cache_quant_algo). [1]
  • "NVFP4" (for ModelOpt NVFP4/FP4 checkpoints). [1][2]

Incorrect quantization string identifiers; use "FP8" and "NVFP4" instead.

vLLM accepts "FP8" for FP8 checkpoints and "NVFP4" for ModelOpt NVFP4/FP4 checkpoints (specified in hf_quant_config.json "quant_algo" fields). The code currently uses "modelopt" and "modelopt_fp4", which are not recognized by vLLM and will fail at runtime. Update lines 141–142 to use the correct identifiers:

  • Replace "modelopt" with "FP8"
  • Replace "modelopt_fp4" with "NVFP4"
🤖 Prompt for AI Agents
In tests/_test_utils/deploy_utils.py around lines 138 to 173, the quantization
string identifiers are incorrect: replace the current defaults so that
quantization_method is "FP8" instead of "modelopt", and when the model_id
contains "FP4" set quantization_method to "NVFP4" instead of "modelopt_fp4";
update the two assignment literals accordingly so vLLM recognizes the
quantization algorithm.

Comment on lines 162 to 199
def deploy_gpt_oss_trtllm(self, tmp_path):
"""Deploy GPT-OSS model with TensorRT-LLM."""
# Prepare benchmark data
tensorrt_llm_workspace = "/app/tensorrt_llm"
script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py")
model_name = self.model_path.split("/")[-1]
benchmark_file = f"{model_name}_synthetic_128_128.txt"

if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
print(f"Creating dataset file '{benchmark_file}'...")
with open(benchmark_file, "w") as fp:
subprocess.run(
f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
--output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400",
shell=True,
check=True,
stdout=fp,
)
else:
print(f"Dataset file '{benchmark_file}' already exists.")

assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist"

cmd_parts = [
"trtllm-bench",
"--model",
self.model_path,
"throughput",
"--backend",
"pytorch",
"--dataset",
benchmark_file,
"--kv_cache_free_gpu_mem_fraction",
"0.9",
"--report_json",
str(tmp_path / "low_latency_throughput.json"),
]
run_example_command(cmd_parts, "gpt-oss")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make dataset generation and bench robust (path, existence, tool checks).

  • Hard-coded "/app/tensorrt_llm" may not exist.
  • Dataset file is created in the repo root but consumed from examples/gpt-oss (cwd mismatch).
  • Missing checks for prepare_dataset.py and trtllm-bench binaries.
@@
-        tensorrt_llm_workspace = "/app/tensorrt_llm"
-        script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py")
+        import shutil
+        script = os.getenv(
+            "TLLM_PREPARE_DATASET",
+            os.path.join(os.getenv("TLLM_WORKSPACE", ""), "benchmarks", "cpp", "prepare_dataset.py"),
+        )
@@
-        model_name = self.model_path.split("/")[-1]
-        benchmark_file = f"{model_name}_synthetic_128_128.txt"
+        model_name = self.model_path.split("/")[-1]
+        benchmark_file = tmp_path / f"{model_name}_synthetic_128_128.txt"
@@
-        if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
+        if not os.path.exists(script) or not os.path.isfile(script):
+            pytest.skip(f"prepare_dataset.py not found at: {script}")
+        if shutil.which("trtllm-bench") is None:
+            pytest.skip("trtllm-bench not found in PATH")
+        if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
@@
-            with open(benchmark_file, "w") as fp:
+            with open(benchmark_file, "w") as fp:
                 subprocess.run(
-                    f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
+                    f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
                     --output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400",
                     shell=True,
                     check=True,
                     stdout=fp,
                 )
@@
-        assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist"
+        assert os.path.isfile(benchmark_file) and os.path.getsize(benchmark_file) > 0, (
+            f"Benchmark file '{benchmark_file}' missing or empty"
+        )
@@
-            "--dataset",
-            benchmark_file,
+            "--dataset",
+            str(benchmark_file),
🤖 Prompt for AI Agents
tests/examples/gpt_oss/test_gpt_oss_qat.py around lines 162 to 199: the
deploy_gpt_oss_trtllm function assumes hard-coded workspace and writes/reads
dataset in repo root and does not verify required tools; change it to (1)
compute tensorrt_llm_workspace relative to the test file or an env var and
verify the directory exists, (2) build absolute paths for prepare_dataset.py and
the benchmark file inside the test tmp_path (or examples/gpt-oss working dir) so
creation and consumption use the same cwd, (3) check that prepare_dataset.py is
present and executable and that trtllm-bench is available on PATH before
running, and (4) run subprocesses with explicit cwd and capture/check return
codes, raising clear assertion errors if files/tools are missing.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (2fd67cc) to head (12943bb).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #455   +/-   ##
=======================================
  Coverage   73.40%   73.40%           
=======================================
  Files         180      180           
  Lines       18047    18047           
=======================================
  Hits        13248    13248           
  Misses       4799     4799           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: noeyy-mino <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/_test_utils/deploy_utils.py (1)

147-149: Incorrect quantization string identifiers; use "FP8" and "NVFP4".

This critical issue from the previous review remains unaddressed. vLLM expects "FP8" for FP8 checkpoints and "NVFP4" for ModelOpt FP4 checkpoints. The current values "modelopt" and "modelopt_fp4" will fail at runtime.

Apply this diff:

-        quantization_method = "modelopt"
+        quantization_method = "FP8"
         if "FP4" in self.model_id:
-            quantization_method = "modelopt_fp4"
+            quantization_method = "NVFP4"
🧹 Nitpick comments (5)
tests/_test_utils/deploy_utils.py (5)

42-50: Complete the docstring with all parameters.

The docstring only documents 4 of the 7 parameters. Please add documentation for attn_backend, base_model, and eagle3_one_model to help users understand their purpose and expected values.


83-88: Misleading comment and timing of GPU status check.

The comment states "Before Test" (line 86), but this GPU status check executes after the deployment method has already run. If the intent is to capture GPU state for debugging failures, consider either:

  1. Moving this check before line 74 (before backend dispatch) and updating the comment
  2. Updating the comment to "After Test" if post-deployment state is the intent

Additionally, the || true in the command (line 84) suppresses failures silently, which may hide useful diagnostic information.


132-138: Add output validation assertions for consistency.

Unlike _deploy_vllm() (lines 160-170), this method only prints outputs without validating them. Consider adding assertions to verify:

  • Number of outputs matches number of prompts
  • Each output has the expected structure
  • Generated text is non-empty

This improves test reliability and maintains consistency across deployment methods.


191-191: Use COMMON_PROMPTS instead of hardcoded prompt.

This method uses a hardcoded prompt while other deployment methods use the COMMON_PROMPTS constant defined at line 23. For consistency and maintainability, consider using COMMON_PROMPTS here as well.

Apply this diff:

-        print(llm.generate(["What's the age of the earth? "]))
+        outputs = llm.generate(COMMON_PROMPTS)
+        print(outputs)

207-212: Consider adding test_id to ModelDeployer.init.

The code dynamically assigns a test_id attribute to deployer instances (line 211), but this attribute is not defined in ModelDeployer.__init__. For clarity and to avoid confusion, consider either:

  1. Adding test_id as a parameter in ModelDeployer.__init__ with a default value
  2. Adding a docstring comment explaining that this attribute is set externally by ModelDeployerList
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c8261a and f922be5.

📒 Files selected for processing (1)
  • tests/_test_utils/deploy_utils.py (1 hunks)
🔇 Additional comments (1)
tests/_test_utils/deploy_utils.py (1)

182-184: Incorrect quantization identifiers for SGLang—fix differs from suggested diff.

SGLang accepts "fp8" for FP8 checkpoints and "modelopt_fp4" for ModelOpt NV-FP4 checkpoints. The current code uses "modelopt" on line 182, which is incorrect.

The line 182 value should be changed to "fp8", but line 184 is already correct as "modelopt_fp4" and should not be changed to "NVFP4".

Apply this corrected diff:

-        quantization_method = "modelopt"
+        quantization_method = "fp8"
         if "FP4" in self.model_id:
             quantization_method = "modelopt_fp4"

Likely an incorrect or invalid review comment.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (1)

165-202: Fix dataset/bench pathing and tool checks; current code will break (cwd mismatch, hard-coded workspace, no CLI checks).

  • Writes the dataset in the test cwd but consumes it from examples/gpt-oss (via run_example_command), so --dataset won’t be found.
  • Hard-codes /app/tensorrt_llm; may not exist.
  • No checks for prepare_dataset.py or trtllm-bench.
  • Uses shell=True with an interpolated path.

This was flagged earlier and persists; please harden as below.

     def deploy_gpt_oss_trtllm(self, tmp_path):
         """Deploy GPT-OSS model with TensorRT-LLM."""
-        # Prepare benchmark data
-        tensorrt_llm_workspace = "/app/tensorrt_llm"
-        script = os.path.join(tensorrt_llm_workspace, "benchmarks", "cpp", "prepare_dataset.py")
-        model_name = self.model_path.split("/")[-1]
-        benchmark_file = f"{model_name}_synthetic_128_128.txt"
+        # Prepare benchmark data
+        script = os.getenv(
+            "TLLM_PREPARE_DATASET",
+            os.path.join(os.getenv("TLLM_WORKSPACE", ""), "benchmarks", "cpp", "prepare_dataset.py"),
+        )
+        model_name = self.model_path.split("/")[-1]
+        benchmark_file = tmp_path / f"{model_name}_synthetic_128_128.txt"
 
-        if not os.path.exists(benchmark_file) or os.path.getsize(benchmark_file) == 0:
-            print(f"Creating dataset file '{benchmark_file}'...")
-            with open(benchmark_file, "w") as fp:
-                subprocess.run(
-                    f"python {script} --stdout --tokenizer={self.model_path} token-norm-dist --input-mean 128 \
-                    --output-mean 128 --input-stdev 0 --output-stdev 0 --num-requests 1400",
-                    shell=True,
-                    check=True,
-                    stdout=fp,
-                )
+        # Preflight checks
+        if not script or not os.path.isfile(script):
+            pytest.skip(f"prepare_dataset.py not found at: {script}")
+        if shutil.which("trtllm-bench") is None:
+            pytest.skip("trtllm-bench not found in PATH")
+
+        if not benchmark_file.exists() or benchmark_file.stat().st_size == 0:
+            print(f"Creating dataset file '{benchmark_file}'...")
+            with open(benchmark_file, "w") as fp:
+                subprocess.run(
+                    [
+                        sys.executable,
+                        script,
+                        "--stdout",
+                        f"--tokenizer={self.model_path}",
+                        "token-norm-dist",
+                        "--input-mean",
+                        "128",
+                        "--output-mean",
+                        "128",
+                        "--input-stdev",
+                        "0",
+                        "--output-stdev",
+                        "0",
+                        "--num-requests",
+                        "1400",
+                    ],
+                    check=True,
+                    stdout=fp,
+                )
         else:
             print(f"Dataset file '{benchmark_file}' already exists.")
 
-        assert os.path.isfile(benchmark_file), f"Benchmark file '{benchmark_file}' should exist"
+        assert benchmark_file.is_file() and benchmark_file.stat().st_size > 0, (
+            f"Benchmark file '{benchmark_file}' missing or empty"
+        )
 
         cmd_parts = [
             "trtllm-bench",
             "--model",
             self.model_path,
             "throughput",
             "--backend",
             "pytorch",
             "--dataset",
-            benchmark_file,
+            str(benchmark_file),
             "--kv_cache_free_gpu_mem_fraction",
             "0.9",
             "--report_json",
             str(tmp_path / "low_latency_throughput.json"),
         ]
         run_example_command(cmd_parts, "gpt-oss")
#!/bin/bash
# Quick preflight: verify required tools and script location.
set -euo pipefail

echo "accelerate: $(command -v accelerate || echo 'MISSING')"
echo "trtllm-bench: $(command -v trtllm-bench || echo 'MISSING')"
echo "TLLM_WORKSPACE=${TLLM_WORKSPACE:-}"
# Try to find prepare_dataset.py under TLLM_WORKSPACE or repo
if [[ -n "${TLLM_WORKSPACE:-}" ]]; then
  test -f "${TLLM_WORKSPACE}/benchmarks/cpp/prepare_dataset.py" && echo "prepare_dataset.py found in TLLM_WORKSPACE" || echo "prepare_dataset.py NOT found under TLLM_WORKSPACE"
else
  fd -H -a -t f prepare_dataset.py 2>/dev/null || true
fi
🧹 Nitpick comments (6)
tests/examples/gpt_oss/test_gpt_oss_qat.py (6)

25-41: Docstring is stale (hard-codes 20B); make it model-agnostic.

The implementation is parameterized by model_path, but the docstring still refers to 20B and fixed input/output names.

 class GPTOSS:
-    """Test GPT-OSS-20B QAT (Quantization-Aware Training) pipeline.
-
-    This test suite covers the complete GPT-OSS-20B optimization pipeline:
-
-    Step 1: test_gpt_oss_sft_training - Supervised Fine-Tuning (SFT)
-           Input: openai/gpt-oss-20b
-           Output: gpt-oss-20b-sft
-
-    Step 2: test_gpt_oss_qat_training - Quantization-Aware Training (QAT)
-           Input: gpt-oss-20b-sft (from Step 1)
-           Output: gpt-oss-20b-qat
-
-    Step 3: test_gpt_oss_mxfp4_conversion - MXFP4 Weight-Only Conversion
-           Input: gpt-oss-20b-qat (from Step 2)
-           Output: gpt-oss-20b-qat-real-mxfp4
-
-    Each step can be run independently (with mock inputs) or as part of the full pipeline.
-    """
+    """Test GPT‑OSS QAT (Quantization‑Aware Training) pipeline.
+
+    Pipeline:
+      1) SFT training
+         Input: {model_path}
+         Output: <model_name>-sft
+      2) QAT training (MXFP4 weight‑only)
+         Input: <model_name>-sft
+         Output: <model_name>-qat
+      3) MXFP4 conversion
+         Input: <model_name>-qat
+         Output: <model_name>-qat-real-mxfp4
+
+    Each step can run independently (creates minimal mock inputs if prior outputs are missing) or as part of the full pipeline.
+    """

16-22: Add missing imports used by the proposed fixes.

We’ll need sys (for sys.executable) and shutil (for CLI checks).

 import os
 import subprocess
+import shutil
+import sys
 
 import pytest
 from _test_utils.examples.run_command import run_example_command
 from _test_utils.torch_misc import minimum_gpu

67-67: Avoid port collisions: allocate a free MASTER_PORT for accelerate.

Enable setup_free_port so parallel tests don’t fight over the default port.

-        run_example_command(cmd_parts, "gpt-oss")
+        run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)
-        run_example_command(cmd_parts, "gpt-oss")
+        run_example_command(cmd_parts, "gpt-oss", setup_free_port=True)

Also applies to: 117-117


151-158: Use the current Python interpreter instead of hardcoded 'python'.

Prevents environment mismatches (e.g., venvs).

-        cmd_parts = [
-            "python",
+        cmd_parts = [
+            sys.executable,
             "convert_oai_mxfp4_weight_only.py",
             "--model_path",
             str(qat_dir),
             "--output_path",
             str(conversion_output_dir),
         ]

205-211: Gate the heavy end-to-end test (optional).

To keep CI fast and predictable, consider marking as slow or guarding by an env var.

-@pytest.mark.parametrize(
+@pytest.mark.parametrize(
     "model_path",
     [
         pytest.param("openai/gpt-oss-20b", id="gpt-oss-20b", marks=minimum_gpu(2)),
         pytest.param("openai/gpt-oss-120b", id="gpt-oss-120b", marks=minimum_gpu(8)),
     ],
 )
+@pytest.mark.skipif(os.getenv("RUN_GPT_OSS_E2E") != "1", reason="Set RUN_GPT_OSS_E2E=1 to run full GPT‑OSS pipeline")

165-202: Optional: reduce dataset size or make it configurable for CI.

1400 requests can be heavy; allow overriding via env for quicker functional runs.

-                        "--num-requests",
-                        "1400",
+                        "--num-requests",
+                        os.getenv("GPT_OSS_NUM_REQUESTS", "200"),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f922be5 and 12943bb.

📒 Files selected for processing (2)
  • tests/examples/gpt_oss/test_gpt_oss_qat.py (1 hunks)
  • tests/examples/llm_ptq/test_deploy.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/examples/llm_ptq/test_deploy.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/examples/gpt_oss/test_gpt_oss_qat.py (2)
tests/_test_utils/examples/run_command.py (1)
  • run_example_command (35-43)
tests/_test_utils/torch_misc.py (1)
  • minimum_gpu (51-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality

@kevalmorabia97
Copy link
Collaborator

You need to sign your commits with an SSH key. Please take a look at https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md#%EF%B8%8F-signing-your-work

@kevalmorabia97
Copy link
Collaborator

@noeyy-mino can any of these tests be also made part of nightly modelopt cicd? If we can use a dummy small model (e.g. 2 layers, smaller hidden size, etc.) and it can run in few minutes then we can enable these tests nightly to catch bugs sooner.
If not, then we can leave that off for later

@noeyy-mino
Copy link
Author

No, they are for QA release testing.

HF_CACHE_PATH = os.getenv("HF_HUB_CACHE", os.path.expanduser("~/.cache/huggingface/hub"))


def clear_hf_cache():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cache cleanup for saving space after tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is in tests/examples/llm_ptq directory, it will be automatically run when nightly cicd runs llm_ptq tests. Can you add some marker here so its skipped nightly or only run during QA cycle?

Signed-off-by: noeyy-mino <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants