Skip to content

fix(frontend): Fix smg serve reject vLLM OpenAI args#832

Merged
slin1237 merged 6 commits intolightseekorg:mainfrom
YouNeedCryDear:fix/smg-serve-vllm
Apr 7, 2026
Merged

fix(frontend): Fix smg serve reject vLLM OpenAI args#832
slin1237 merged 6 commits intolightseekorg:mainfrom
YouNeedCryDear:fix/smg-serve-vllm

Conversation

@YouNeedCryDear
Copy link
Copy Markdown
Contributor

@YouNeedCryDear YouNeedCryDear commented Mar 20, 2026

Description

Problem

Fixes #824

smg serve only registered vLLM engine args during pass-2 parsing. That meant valid vLLM OpenAI/frontend flags
such as --enable-auto-tool-choice and --tool-call-parser were rejected before worker launch, even though they
are valid for the HTTP/OpenAI vLLM entrypoint.

The existing parser shape also blurred the mode-specific CLI surface for vLLM: HTTP workers need frontend args,
while gRPC workers do not.

Solution

Split vLLM arg registration into two parts:

  • keep engine arg registration in _add_vllm_args()
  • add a dedicated _add_vllm_frontend_args() helper for OpenAI/frontend flags

Then, in parse_serve_args(), only add the frontend args when --backend vllm and --connection-mode http are
selected. This keeps the parser aligned with the actual worker entrypoint:

  • HTTP mode accepts frontend + engine args
  • gRPC mode rejects frontend-only args early

Changes

  • updated bindings/python/src/smg/serve.py so _add_vllm_args() only loads vLLM engine CLI args
  • added _add_vllm_frontend_args() to load vllm.entrypoints.openai.cli_args.FrontendArgs when available
  • gated frontend arg registration in parse_serve_args() on serve_router_args.connection_mode == "http"
  • added regression tests in bindings/python/tests/test_serve.py for:
    • HTTP mode accepting --enable-auto-tool-choice --tool-call-parser minimax_m2
    • gRPC mode rejecting those frontend-only flags

Test Plan

Before:

smg serve --backend vllm --connection-mode http --model /tmp/model --enable-auto-tool-choice --tool-call-parser
minimax_m2

Expected old behavior:

argparse exits with unrecognized arguments: --enable-auto-tool-choice --tool-call-parser minimax_m2
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg (https://slack.lightseek.org) to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Support for vLLM OpenAI frontend arguments when serving with HTTP, enabling configuration of tool-choice and tool-call parsing.
  • Improvements

    • Argument parsing now detects and reports missing vLLM dependencies more clearly and adapts backend argument handling accordingly.
  • Tests

    • Added unit tests covering vLLM frontend argument behavior for HTTP and gRPC connection modes.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where smg serve was rejecting vLLM OpenAI API arguments. The core change involves enhancing the argument parsing logic for vLLM to properly handle these specific CLI flags. This ensures that users can leverage the full range of vLLM's OpenAI-compatible server features when using smg serve. Additionally, new tests have been added to validate this functionality and a warning is now issued when OpenAI arguments are provided in gRPC connection mode, clarifying expected behavior.

Highlights

  • vLLM Argument Parsing: Updated the smg serve command's vLLM argument parsing to correctly recognize and accept OpenAI-compatible server CLI flags, ensuring broader compatibility.
  • Two-Pass Parsing Behavior: Maintained the existing two-pass parsing behavior for smg serve while allowing vLLM worker arguments to pass through effectively.
  • Regression Tests Added: Introduced new regression tests within bindings/python/tests/test_serve.py to cover the vLLM argument parsing path, preventing future regressions.
  • gRPC Connection Mode Warning: Implemented a warning mechanism for the gRPC connection mode when OpenAI CLI arguments are present, informing users that these arguments will be ignored as SMG acts as the API server.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Dynamically load vLLM engine arg classes, add optional vLLM OpenAI frontend CLI arg registration, and inject those frontend args into serve argument parsing when backend == "vllm" and connection_mode == "http"; tests validate HTTP acceptance and gRPC rejection of frontend flags.

Changes

Cohort / File(s) Summary
vLLM serve changes
bindings/python/src/smg/serve.py
Prefer AsyncEngineArgs from vllm.engine.arg_utils with EngineArgs fallback; add _add_vllm_frontend_args to conditionally register OpenAI frontend CLI args; inject frontend args into the final parse when backend=="vllm" and connection_mode=="http"; raise parser error if vLLM args import fails.
Serve tests (vLLM frontend handling)
bindings/python/tests/test_serve.py
Add tests covering: HTTP mode accepts and preserves vLLM frontend flags in backend_args; gRPC mode rejects those flags (SystemExit code 2) and does not call frontend registration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • key4ng
  • slin1237
  • gongwei-130

Poem

"I nibbled through the parser's hedge,
Found Async tucked behind the ledge,
HTTP welcomes frontend flair,
gRPC keeps those flags elsewhere,
A rabbit's hop — now arguments pair."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the fix: resolving the rejection of vLLM OpenAI arguments in the smg serve command.
Linked Issues check ✅ Passed All coding objectives from issue #824 are met: vLLM frontend args are now registered via _add_vllm_frontend_args(), HTTP mode accepts OpenAI flags, gRPC mode rejects them early, and regression tests validate both scenarios.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #824: modifications to serve.py handle vLLM arg registration, and new tests validate the fix without introducing unrelated alterations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added python-bindings Python bindings changes tests Test changes labels Mar 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e796a884d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vLLM argument parsing in smg serve to correctly handle OpenAI-compatible server flags and adds corresponding regression tests. The changes to support additional vLLM arguments are well-implemented. However, I've found a critical bug in the argument parsing logic that could lead to a NameError or incorrect behavior. I've provided a specific comment with a suggested fix to address this issue.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindings/python/src/smg/serve.py`:
- Around line 526-534: The control flow around backend argument parsing is
wrong: change the second `if` to `elif` so `args` set by
`parser.parse_known_args` for `backend == "trtllm"` isn't immediately
overwritten, and ensure `args` is always defined before being referenced;
specifically update the block that calls `parser.parse_known_args` and
`parser.parse_args` (symbols: backend, args, parser.parse_known_args,
parser.parse_args) so the condition using `getattr(args, "connection_mode",
"grpc")` and `logger.warning` only runs in the `elif backend == "vllm" ...`
branch, preserving the intended behavior for both `trtllm` and `vllm`.

In `@bindings/python/tests/test_serve.py`:
- Around line 289-320: Summary: parse_serve_args in serve.py has a control-flow
bug that leads to a NameError because parser/backend_args are referenced before
being set when handling backend-specific args (e.g., "vllm"). Fix: ensure the
ArgumentParser instance is created before any backend-specific call and that
_import_backend_args(backend, parser) is invoked in the branch that handles
backend-specific flags (or unconditionally after parser creation), then call
parser.parse_known_args() (or otherwise collect backend_args) and assign
backend_args in all code paths; update parse_serve_args so symbols parser, args,
and backend_args are always defined before they are returned (refer to
parse_serve_args and _import_backend_args to locate the change).
- Around line 322-351: The test fails with a NameError due to a control-flow bug
in serve.py's parse_serve_args where backend-specific CLI args (via
_import_backend_args / parser population) are not always imported/defined before
being referenced for the vllm + connection_mode == "grpc" path; update
parse_serve_args so it always constructs and populates the argparse parser
(calling _import_backend_args or the vllm branch) before any logic that reads
backend-specific arguments or calls logger.warning, and ensure any variables
referenced in the gRPC-warning branch (e.g., the parser, args, or
backend-specific flags) are initialized regardless of code path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e157cc9d-d627-4cc0-946a-d1f1b1967939

📥 Commits

Reviewing files that changed from the base of the PR and between 4ecd869 and 7e796a8.

📒 Files selected for processing (2)
  • bindings/python/src/smg/serve.py
  • bindings/python/tests/test_serve.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3cc48129fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5797d231be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@YouNeedCryDear YouNeedCryDear changed the title (fix): Fix smg serve reject vLLM OpenAI args fix (frontend): Fix smg serve reject vLLM OpenAI args Mar 20, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Copy link
Copy Markdown
Member

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will merge after CI passes

@CatherineSue CatherineSue changed the title fix (frontend): Fix smg serve reject vLLM OpenAI args fix(frontend): Fix smg serve reject vLLM OpenAI args Mar 25, 2026
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindings/python/src/smg/serve.py`:
- Around line 380-393: The vLLM frontend args are being registered wholesale in
_add_vllm_frontend_args, which allows undesired transport flags (--host, --port,
--uds) that break SMG HTTP worker routing; update _add_vllm_frontend_args to
import FrontendArgs (if present) but either (a) call FrontendArgs.add_cli_args
with a filtered argparse.ArgumentParser that has the --host, --port, and --uds
options removed/disabled before registration or (b) register only the safe
subset of flags (e.g., enable-prompt-tokens-details, enable-auto-tool-choice)
instead of calling add_cli_args; ensure VllmWorkerLauncher.build_command still
injects SMG --host/--port and add a regression test that starts a vLLM worker in
HTTP mode with transport-layer flags present (including --uds) and verifies the
worker binds to SMG host:port (health checks succeed).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f9ba4bf7-d5e2-4671-ab48-385d1148524b

📥 Commits

Reviewing files that changed from the base of the PR and between bfcfad8 and 249ffe9.

📒 Files selected for processing (2)
  • bindings/python/src/smg/serve.py
  • bindings/python/tests/test_serve.py

Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ea8c7b303

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@slin1237 slin1237 merged commit 762600b into lightseekorg:main Apr 7, 2026
33 checks passed
TingtingZhou7 pushed a commit to TingtingZhou7/smg that referenced this pull request Apr 7, 2026
…seekorg#1041)

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

feat(metrics-ws): [3/4] add event-driven and polled collectors (lightseekorg#1027)

Signed-off-by: key4ng <rukeyang@gmail.com>

fix(responses): stop defaulting top_p for omitted requests (lightseekorg#1043)

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

fix(frontend): Fix smg serve reject vLLM OpenAI args (lightseekorg#832)

Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>

feat(realtime-api): WebRTC relay bridge (lightseekorg#733)

Signed-off-by: yifeliu <yifengliu9@gmail.com>

feat: align image-generation MCP behavior and response formatting

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

fix(image): only treat wrapped payload with result as image payload

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

refactor(image): remove unused fallback candidate in wrapped extractor

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

fix(image): preserve image tool arguments during sanitization

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

refactor(image): only inject default model in builtin sanitizer

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

fix(image): force override builtin image model in sanitizer

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

fix(image): force png output_compression to 100

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

revert(image): restore minimal sanitizer and document fallback

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

refactor(responses): update image tool serde attr and preserve raw non-image output strings

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

refactor(image): rename generate_image to image_generation

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

revert(image): restore generate_image builtin naming

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

fix(image-context): retain image failure status and error in compact output

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>

Fix

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python-bindings Python bindings changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: smg serve --backend vllm rejects valid vLLM OpenAI server args like --enable-auto-tool-choice and --tool-call-parser

3 participants