fix(frontend): Fix smg serve reject vLLM OpenAI args#832
fix(frontend): Fix smg serve reject vLLM OpenAI args#832slin1237 merged 6 commits intolightseekorg:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDynamically load vLLM engine arg classes, add optional vLLM OpenAI frontend CLI arg registration, and inject those frontend args into serve argument parsing when Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e796a884d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM argument parsing in smg serve to correctly handle OpenAI-compatible server flags and adds corresponding regression tests. The changes to support additional vLLM arguments are well-implemented. However, I've found a critical bug in the argument parsing logic that could lead to a NameError or incorrect behavior. I've provided a specific comment with a suggested fix to address this issue.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindings/python/src/smg/serve.py`:
- Around line 526-534: The control flow around backend argument parsing is
wrong: change the second `if` to `elif` so `args` set by
`parser.parse_known_args` for `backend == "trtllm"` isn't immediately
overwritten, and ensure `args` is always defined before being referenced;
specifically update the block that calls `parser.parse_known_args` and
`parser.parse_args` (symbols: backend, args, parser.parse_known_args,
parser.parse_args) so the condition using `getattr(args, "connection_mode",
"grpc")` and `logger.warning` only runs in the `elif backend == "vllm" ...`
branch, preserving the intended behavior for both `trtllm` and `vllm`.
In `@bindings/python/tests/test_serve.py`:
- Around line 289-320: Summary: parse_serve_args in serve.py has a control-flow
bug that leads to a NameError because parser/backend_args are referenced before
being set when handling backend-specific args (e.g., "vllm"). Fix: ensure the
ArgumentParser instance is created before any backend-specific call and that
_import_backend_args(backend, parser) is invoked in the branch that handles
backend-specific flags (or unconditionally after parser creation), then call
parser.parse_known_args() (or otherwise collect backend_args) and assign
backend_args in all code paths; update parse_serve_args so symbols parser, args,
and backend_args are always defined before they are returned (refer to
parse_serve_args and _import_backend_args to locate the change).
- Around line 322-351: The test fails with a NameError due to a control-flow bug
in serve.py's parse_serve_args where backend-specific CLI args (via
_import_backend_args / parser population) are not always imported/defined before
being referenced for the vllm + connection_mode == "grpc" path; update
parse_serve_args so it always constructs and populates the argparse parser
(calling _import_backend_args or the vllm branch) before any logic that reads
backend-specific arguments or calls logger.warning, and ensure any variables
referenced in the gRPC-warning branch (e.g., the parser, args, or
backend-specific flags) are initialized regardless of code path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e157cc9d-d627-4cc0-946a-d1f1b1967939
📒 Files selected for processing (2)
bindings/python/src/smg/serve.pybindings/python/tests/test_serve.py
7e796a8 to
3cc4812
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3cc48129fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
3cc4812 to
5797d23
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5797d231be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5797d23 to
bfcfad8
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
CatherineSue
left a comment
There was a problem hiding this comment.
LGTM. Will merge after CI passes
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
ff6b3af to
249ffe9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindings/python/src/smg/serve.py`:
- Around line 380-393: The vLLM frontend args are being registered wholesale in
_add_vllm_frontend_args, which allows undesired transport flags (--host, --port,
--uds) that break SMG HTTP worker routing; update _add_vllm_frontend_args to
import FrontendArgs (if present) but either (a) call FrontendArgs.add_cli_args
with a filtered argparse.ArgumentParser that has the --host, --port, and --uds
options removed/disabled before registration or (b) register only the safe
subset of flags (e.g., enable-prompt-tokens-details, enable-auto-tool-choice)
instead of calling add_cli_args; ensure VllmWorkerLauncher.build_command still
injects SMG --host/--port and add a regression test that starts a vLLM worker in
HTTP mode with transport-layer flags present (including --uds) and verifies the
worker binds to SMG host:port (health checks succeed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f9ba4bf7-d5e2-4671-ab48-385d1148524b
📒 Files selected for processing (2)
bindings/python/src/smg/serve.pybindings/python/tests/test_serve.py
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ea8c7b303
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…seekorg#1041) Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com> feat(metrics-ws): [3/4] add event-driven and polled collectors (lightseekorg#1027) Signed-off-by: key4ng <rukeyang@gmail.com> fix(responses): stop defaulting top_p for omitted requests (lightseekorg#1043) Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com> fix(frontend): Fix smg serve reject vLLM OpenAI args (lightseekorg#832) Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com> feat(realtime-api): WebRTC relay bridge (lightseekorg#733) Signed-off-by: yifeliu <yifengliu9@gmail.com> feat: align image-generation MCP behavior and response formatting Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> fix(image): only treat wrapped payload with result as image payload Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> refactor(image): remove unused fallback candidate in wrapped extractor Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> fix(image): preserve image tool arguments during sanitization Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> refactor(image): only inject default model in builtin sanitizer Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> fix(image): force override builtin image model in sanitizer Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> fix(image): force png output_compression to 100 Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> revert(image): restore minimal sanitizer and document fallback Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> refactor(responses): update image tool serde attr and preserve raw non-image output strings Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> refactor(image): rename generate_image to image_generation Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> revert(image): restore generate_image builtin naming Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> fix(image-context): retain image failure status and error in compact output Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com> Fix Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
Description
Problem
Fixes #824
smg serveonly registered vLLM engine args during pass-2 parsing. That meant valid vLLM OpenAI/frontend flagssuch as
--enable-auto-tool-choiceand--tool-call-parserwere rejected before worker launch, even though theyare valid for the HTTP/OpenAI vLLM entrypoint.
The existing parser shape also blurred the mode-specific CLI surface for vLLM: HTTP workers need frontend args,
while gRPC workers do not.
Solution
Split vLLM arg registration into two parts:
_add_vllm_args()_add_vllm_frontend_args()helper for OpenAI/frontend flagsThen, in
parse_serve_args(), only add the frontend args when--backend vllmand--connection-mode httpareselected. This keeps the parser aligned with the actual worker entrypoint:
Changes
bindings/python/src/smg/serve.pyso_add_vllm_args()only loads vLLM engine CLI args_add_vllm_frontend_args()to loadvllm.entrypoints.openai.cli_args.FrontendArgswhen availableparse_serve_args()onserve_router_args.connection_mode == "http"bindings/python/tests/test_serve.pyfor:--enable-auto-tool-choice --tool-call-parser minimax_m2Test Plan
Before:
Summary by CodeRabbit
New Features
Improvements
Tests