feat(render): implement VllmRender gRPC service for GPU-less rendering by hyeongyun0916 · Pull Request #784 · lightseekorg/smg

hyeongyun0916 · 2026-03-17T07:44:25Z

Description

Context

This PR adds render gRPC support (VllmRender service) to smg-grpc-proto and smg-grpc-servicer, required by vllm-project/vllm#36102.

Per review feedback, the render servicer should live in this package rather than in the vllm repo, following the same pattern as VllmEngineServicer (#36169).

Problem

vLLM's disaggregated serving architecture requires a GPU-less render node that applies chat templates and tokenizes requests without running inference. Currently there is no gRPC interface for this render-only functionality, limiting communication between prefill/decode nodes and the render node to HTTP only.

Solution

Implement a new VllmRender gRPC service with management RPCs (HealthCheck, GetModelInfo, GetServerInfo) and rendering RPCs (RenderChat, RenderCompletion). The service converts protobuf messages to vLLM's Pydantic request models, delegates to openai_serving_render, and serializes responses back to proto.

Changes

Add vllm_render.proto defining the VllmRender service, chat/completion rendering messages, and GenerateRequestProto
Add RenderGrpcServicer implementing all VllmRender RPCs with proper gRPC status code error handling
Add proto_utils.py with generic protobuf ↔ Pydantic/dict conversion utilities (proto_to_dict, from_proto, pydantic_to_proto)
Add field_transforms.py with transform rules bridging proto field naming limitations to vLLM's OpenAI-compatible Python models
Export vllm_render_pb2 / vllm_render_pb2_grpc from smg-grpc-proto package
Bump smg-grpc-proto to 0.5.0 and smg-grpc-servicer to 0.6.0
Add 51 unit tests covering field_transforms, proto_utils, and render_servicer

Test Plan

pytest grpc_servicer/tests/ -v — 51 passed
Verify proto compiles: pip install -e crates/grpc_client/python/ builds vllm_render_pb2 stubs successfully

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

New Features
- Added a GPU-less vLLM Render gRPC service for chat and completion rendering with OpenAI-compatible multimodal messages, tool invocation, and health/model/server info endpoints.
Chores
- Bumped package versions, exposed render protos/servicer in public exports, and updated build to compile the new render proto.
Enhancements
- Added utilities to transform proto payloads to application models and to convert between protobuf and Pydantic types.
Tests
- Added unit tests for the render servicer, field transforms, and protobuf conversion utilities.

…ing RPCs Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

coderabbitai · 2026-03-17T07:44:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new vllm_render protobuf and Python exports, implements a RenderGrpcServicer with request conversion and error mapping, provides proto↔Pydantic helpers and field transforms, updates packaging/dependencies, and includes unit tests and build updates for the new proto/service.

Changes

Cohort / File(s)	Summary
Proto Definitions `crates/grpc_client/proto/vllm_render.proto`	New `vllm.grpc.render` package and `VllmRender` service with management RPCs plus `RenderChat`/`RenderCompletion` request/response and supporting multimodal, tool, prompt, and GenerateRequestProto types.
Client build & Python exports `crates/grpc_client/build.rs`, `crates/grpc_client/python/pyproject.toml`, `crates/grpc_client/python/smg_grpc_proto/__init__.py`	Includes new proto in build triggers/compilation, bumps Python client version to 0.5.0, and re-exports generated `vllm_render_pb2`/`vllm_render_pb2_grpc`.
Servicer implementation & exports `grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py`, `grpc_servicer/smg_grpc_servicer/vllm/__init__.py`	Adds `RenderGrpcServicer` with HealthCheck/GetModelInfo/GetServerInfo and RenderChat/RenderCompletion handlers; exports `RenderGrpcServicer` from vllm package.
Proto ↔ Pydantic helpers `grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py`, `grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py`	Adds conversion utilities (`proto_to_dict`, `from_proto`, `pydantic_to_proto`, recursive transforms) and field transforms (`flatten_completion_prompt`, `_parse_tool_choice`, `_ensure_message_content`, `FIELD_TRANSFORMS`).
Packaging & deps `grpc_servicer/pyproject.toml`	Bumps servicer version to 0.6.0, tightens `smg-grpc-proto` and `vllm` constraints, adds dev extras and pytest configuration.
Tests & fixtures `grpc_servicer/tests/...` `grpc_servicer/tests/conftest.py`, `grpc_servicer/tests/test_field_transforms.py`, `grpc_servicer/tests/test_proto_utils.py`, `grpc_servicer/tests/test_render_servicer.py`	Adds fixtures for mock state/context and extensive unit tests for field transforms, proto utilities, RenderGrpcServicer behaviors, success/error flows, and serialization paths.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant RenderServicer as RenderGrpcServicer
    participant ProtoUtils as ProtoUtils
    participant Pydantic as PydanticModel
    participant Renderer as vLLM_Render

    Client->>RenderServicer: RenderChat(RenderChatRequest proto)
    RenderServicer->>ProtoUtils: from_proto(proto, transforms)
    ProtoUtils->>ProtoUtils: MessageToDict + _apply_transforms
    ProtoUtils->>Pydantic: construct request model
    Pydantic-->>RenderServicer: request instance
    RenderServicer->>Renderer: render_chat_request(request)
    Renderer-->>RenderServicer: GenerateRequest (Pydantic)
    RenderServicer->>ProtoUtils: pydantic_to_proto(GenerateRequest)
    ProtoUtils-->>RenderServicer: GenerateRequestProto
    RenderServicer-->>Client: RenderChatResponse(GenerateRequestProto)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: add smg-grpc-servicer package #638: related work extending the servicer package and gRPC surface that this render servicer and proto additions build upon.

Suggested labels

grpc, python-bindings, protocols

Suggested reviewers

CatherineSue
key4ng
slin1237

Poem

🐰 I hopped through proto fields tonight,
packed tokens, images, prompts just right,
I bounced from proto into pydantic,
stitched render paths both calm and frantic,
tests twitched their noses — everything’s bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: implementing a VllmRender gRPC service for GPU-less rendering, which aligns with the extensive additions across proto definitions, servicer implementations, utilities, and tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-17T07:44:58Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial VllmRender gRPC service to support vLLM's disaggregated serving architecture. This new service enables GPU-less render nodes to efficiently apply chat templates and tokenize requests via a gRPC interface, addressing the current limitation of HTTP-only communication for render-only functionality. The changes facilitate seamless integration and improved performance for rendering operations within the vLLM ecosystem.

Highlights

New gRPC Service: Introduced VllmRender gRPC service for GPU-less rendering, including management and rendering RPCs to support vLLM's disaggregated serving architecture.
Protocol Definition: Defined vllm_render.proto to specify the new service, comprehensive chat/completion rendering messages, and GenerateRequestProto.
Servicer Implementation: Implemented RenderGrpcServicer to handle VllmRender RPCs with robust gRPC status code error handling.
Protobuf Utilities: Added proto_utils.py for generic protobuf to Pydantic/dict conversion and field_transforms.py for bridging proto field naming limitations to vLLM's OpenAI-compatible Python models.
Package Updates: Exported new protobuf stubs (vllm_render_pb2, vllm_render_pb2_grpc) and bumped smg-grpc-proto to 0.5.0 and smg-grpc-servicer to 0.6.0.
Comprehensive Testing: Included 51 new unit tests covering field transformations, protobuf utilities, and the render servicer to ensure correctness.

Changelog

crates/grpc_client/proto/vllm_render.proto
- Added a new protobuf definition for the VllmRender service, including HealthCheck, GetModelInfo, GetServerInfo, RenderChat, and RenderCompletion RPCs.
- Defined messages for chat rendering (e.g., ImageUrlContent, ContentPart, ChatCompletionMessage, RenderChatRequest, RenderChatResponse) and completion rendering (e.g., CompletionPrompt, RenderCompletionRequest, RenderCompletionResponse).
- Included GenerateRequestProto for mirroring vLLM's internal request protocol.
crates/grpc_client/python/pyproject.toml
- Updated the smg-grpc-proto package version from 0.4.2 to 0.5.0.
crates/grpc_client/python/smg_grpc_proto/init.py
- Exported the newly generated vllm_render_pb2 and vllm_render_pb2_grpc modules.
grpc_servicer/pyproject.toml
- Updated the smg-grpc-servicer package version from 0.5.0 to 0.6.0.
- Updated the dependency on smg-grpc-proto to >=0.5.0.
- Added dev optional dependencies for pytest and pytest-asyncio.
- Configured pytest.ini_options to specify test paths and asyncio mode.
grpc_servicer/smg_grpc_servicer/vllm/init.py
- Imported and exported RenderGrpcServicer to make it part of the public API.
grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py
- Added a new module defining FIELD_TRANSFORMS for converting protobuf field names/values to match vLLM's Pydantic models.
- Implemented flatten_completion_prompt to handle CompletionPrompt oneof fields.
- Implemented _ensure_message_content to ensure content keys are present in chat messages.
grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py
- Added a new module with generic utilities for converting between protobuf messages and Python types.
- Provided proto_to_dict, from_proto, and pydantic_to_proto functions.
- Included _apply_transforms for recursive application of field transformations.
grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py
- Added a new module implementing the RenderGrpcServicer class.
- Implemented HealthCheck, GetModelInfo, GetServerInfo, RenderChat, and RenderCompletion RPCs.
- Integrated proto_utils and field_transforms for request/response handling.
grpc_servicer/tests/conftest.py
- Added a new module providing shared pytest fixtures, including mock_grpc_context, mock_model_config, and mock_state.
grpc_servicer/tests/test_field_transforms.py
- Added a new module with unit tests for field_transforms.py, covering flatten_completion_prompt and _ensure_message_content.
grpc_servicer/tests/test_proto_utils.py
- Added a new module with unit tests for proto_utils.py, covering _apply_transforms, proto_to_dict, from_proto, and pydantic_to_proto.
grpc_servicer/tests/test_render_servicer.py
- Added a new module with unit tests for render_servicer.py, covering all RPC methods of RenderGrpcServicer.

Activity

The author, hyeongyun0916, has implemented a new gRPC service for GPU-less rendering in vLLM.
The pull request includes 51 new unit tests which have passed, ensuring the correctness of the new protobuf utilities, field transformations, and the RenderGrpcServicer.
The protobuf definitions were successfully compiled, confirming the integrity of the new .proto file.
The author has performed cargo +nightly fmt and cargo clippy checks, indicating adherence to code style and quality standards.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a new VllmRender gRPC service for GPU-less rendering. The implementation is well-structured, separating concerns into protobuf definitions, conversion utilities, and the service logic. The code is clean, robustly handles errors, and is accompanied by a comprehensive test suite. I have one minor suggestion to improve code clarity in the servicer implementation by removing some unreachable code. Overall, this is an excellent contribution.

gemini-code-assist · 2026-03-17T07:46:02Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+                grpc.StatusCode.UNIMPLEMENTED,
+                "RenderChat is not configured on this server.",
+            )
+            return


The return statement here is unreachable because context.abort() raises a grpc.aio.AbortError exception, which terminates the method's execution. Removing this unreachable code improves clarity.

This same pattern of an unreachable return after context.abort() also occurs on lines 77, 95, and 109. All of them can be removed.

applied in 949d979

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7c5574205

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T07:51:18Z

grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py

+FIELD_TRANSFORMS: dict[str, tuple[str, Any]] = {
+    "parameters_json": ("parameters", json.loads),
+    "content_parts": ("content", None),
+    "prompt": ("prompt", flatten_completion_prompt),
+    "messages": ("messages", _ensure_message_content),


Parse structured tool_choice before building ChatCompletionRequest

RenderChatRequest documents that tool_choice may be JSON ("none"/"auto"/"required" or JSON), but the transform table never deserializes that field, so from_proto(...) forwards raw JSON text into ChatCompletionRequest instead of an object. Requests that force a specific tool (for example a named function choice) will therefore be interpreted as a plain string and fail validation or behave incorrectly, which breaks tool-calling render flows.

Useful? React with 👍 / 👎.

applied in 0f36685

chatgpt-codex-connector · 2026-03-17T07:51:18Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+        except Exception as e:
+            await context.abort(grpc.StatusCode.INTERNAL, str(e))


Map request-decoding failures to INVALID_ARGUMENT

The broad except Exception path converts all parsing/validation failures into INTERNAL, so malformed client payloads (for example bad JSON in parameters_json during proto→dict transforms) are reported as server faults instead of request errors. This misclassifies user input bugs as 500s, can trigger unnecessary retries/alerts, and makes debugging client-side request issues harder.

Useful? React with 👍 / 👎.

applied in d86106a

…AbortError handling Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…ethods Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d86106a2db

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T08:40:46Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+        return vllm_engine_pb2.GetModelInfoResponse(
+            model_path=model_config.model,
+            is_generation=model_config.runner_type == "generate",
+            max_context_length=model_config.max_model_len,
+            vocab_size=model_config.get_vocab_size(),


Populate served_model_name in render model info

GetModelInfo builds a vllm_engine_pb2.GetModelInfoResponse but never sets served_model_name, so this RPC always reports an empty alias even when the server is configured with a custom served model name. In deployments where served_model_name differs from model_path, downstream discovery/routing that prioritizes this label (for example model ID selection in model_gateway/src/core/steps/worker/local/create_worker.rs) can pick the wrong identifier and misroute traffic; this should mirror the engine servicer behavior by filling served_model_name from model config.

Useful? React with 👍 / 👎.

Already addressed in 9cd6eb2 — GetModelInfo now sets served_model_name from model_config.

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/grpc_client/proto/vllm_render.proto`:
- Around line 43-49: The ContentPart message currently allows multiple payload
fields to be set simultaneously; change its definition to enforce exclusivity by
wrapping text, image_url, input_audio, and video_url inside a oneof (e.g., oneof
payload) so only one variant can be present at a time; update the ContentPart
message (and any generated/consuming code expectations) to use the oneof payload
for the fields referenced as text, image_url (ImageUrlContent), input_audio
(InputAudioContent), and video_url (VideoUrlContent) to match the Rust enum
semantics.

In `@grpc_servicer/pyproject.toml`:
- Around line 33-35: The dev extra is missing the vllm dependency which causes
pip install -e .[dev] to fail because tests import smg_grpc_servicer.vllm.*
(top-level imports from vllm like vllm.logger, vllm.outputs); update the dev
extra (the "dev" entry in pyproject.toml) to include vllm (e.g., add
"vllm>=0.16.0" to the list or reference the vllm extra via ".[vllm]") so
installing the dev extras pulls in vllm.

In `@grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py`:
- Line 41: Replace the silent "return None" in the CompletionPrompt handling
code with an explicit ValueError so malformed prompt dicts fail fast; locate the
branch that checks/handles CompletionPrompt shapes in field_transforms.py (the
code that currently returns None for unknown prompt dicts) and raise
ValueError("Unsupported CompletionPrompt shape") (or a similarly descriptive
message including the offending value) instead of returning None so the caller
can map it to an INVALID_ARGUMENT error.

In `@grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py`:
- Around line 47-50: The pydantic_to_proto function currently calls
ParseDict(..., ignore_unknown_fields=True) which silently drops unknown fields;
change it to fail-fast or explicitly whitelist fields: either remove
ignore_unknown_fields=True so ParseDict raises on unknown keys, or derive an
allowlist from the target proto (e.g., use message_class.DESCRIPTOR.fields to
get allowed field names) and filter the dict returned by
model.model_dump(mode="json", exclude_none=True) to only those keys before
calling ParseDict; reference the pydantic_to_proto function, the message_class
parameter, and ParseDict when making the change.

In `@grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py`:
- Around line 40-46: The GetModelInfoResponse / GetServerInfoResponse currently
rely on proto defaults for shared fields; explicitly set served_model_name,
active_requests, is_paused, kv_connector, and kv_role when constructing
responses in render_servicer.py (the GetModelInfoResponse return and the
analogous GetServerInfoResponse around lines 49-53) so consumers aren’t left
with ambiguous defaults—use the appropriate values from model_config or server
state (e.g., served model identifier from model_config, current active request
count, paused state flag, and KV connector/role info) and fall back to explicit
zero/empty values only if the source is absent, then run the
request_verification mentioned to ensure no consumer expects implicit defaults.
- Around line 86-87: Replace the direct exposure of internal exception text in
the except blocks that call await context.abort(grpc.StatusCode.INTERNAL,
str(e)) (occurrences around the context.abort calls at lines referenced) by
logging the full exception server-side (use logger.exception(...) or create
module logger = logging.getLogger(__name__) and call
logger.exception("render_servicer error")) and then aborting with a generic
message such as await context.abort(grpc.StatusCode.INTERNAL, "Internal server
error"); update both places that use str(e) (the except blocks referencing
variable e) to follow this pattern.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f4cb66a-2f59-41a7-b57f-d246aa9f1b0f

📥 Commits

Reviewing files that changed from the base of the PR and between 3282dc1 and d86106a.

📒 Files selected for processing (13)

crates/grpc_client/proto/vllm_render.proto
crates/grpc_client/python/pyproject.toml
crates/grpc_client/python/smg_grpc_proto/__init__.py
grpc_servicer/pyproject.toml
grpc_servicer/smg_grpc_servicer/vllm/__init__.py
grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py
grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py
grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py
grpc_servicer/tests/__init__.py
grpc_servicer/tests/conftest.py
grpc_servicer/tests/test_field_transforms.py
grpc_servicer/tests/test_proto_utils.py
grpc_servicer/tests/test_render_servicer.py

crates/grpc_client/proto/vllm_render.proto

grpc_servicer/pyproject.toml

grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

…ypes Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…r empty dict and unknown keys Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…s parameter Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…de served_model_name and additional server info Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…letion methods Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/grpc_client/proto/vllm_render.proto`:
- Around line 119-121: The TokenIdSequence.message and GenerateRequestProto use
different scalar types for token IDs (TokenIdSequence.token_ids is uint32 while
GenerateRequestProto.token_ids is int32); pick one consistent scalar (either
change TokenIdSequence.token_ids to int32 or change
GenerateRequestProto.token_ids to uint32), update both message definitions to
the chosen type, and then regenerate protobuf artifacts and update any code that
relies on TokenIdSequence or GenerateRequestProto to the unified type to avoid
type mismatches.

In `@grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py`:
- Line 20: The FieldTransforms type alias currently uses Any for the transform
which weakens static checking; update FieldTransforms to use a callable type
such as Callable[[Any], Any] | None (e.g. FieldTransforms = dict[str, tuple[str,
Callable[[Any], Any] | None]]) and add the necessary import for Callable from
typing so editors and type-checkers can validate transform functions used by the
code.

In `@grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py`:
- Around line 9-24: Move the module-level logger initialization so all imports
are grouped above it: relocate the line defining logger =
logging.getLogger(__name__) to after the import block (after the last import,
e.g. after the import of ErrorResponse) so no imports are interleaved with
module-level code in render_servicer.py; ensure any code relying on the logger
still references the same symbol name (logger).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 401e2da5-518d-4628-8277-b476a83f9857

📥 Commits

Reviewing files that changed from the base of the PR and between d86106a and 9ee771a.

📒 Files selected for processing (9)

crates/grpc_client/proto/vllm_render.proto
grpc_servicer/pyproject.toml
grpc_servicer/smg_grpc_servicer/vllm/field_transforms.py
grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py
grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py
grpc_servicer/tests/conftest.py
grpc_servicer/tests/test_field_transforms.py
grpc_servicer/tests/test_proto_utils.py
grpc_servicer/tests/test_render_servicer.py

crates/grpc_client/proto/vllm_render.proto

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ee771ad64

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T09:49:29Z

crates/grpc_client/proto/vllm_render.proto

+  repeated ChatCompletionTool tools = 3;
+  optional string tool_choice = 4;  // "none"/"auto"/"required" or JSON
+  optional string chat_template = 5;
+  bool add_generation_prompt = 6;


Make render bool overrides presence-aware

RenderChatRequest.add_generation_prompt (and similarly RenderCompletionRequest.add_special_tokens) is defined as a plain proto3 bool, but request decoding goes through MessageToDict in from_proto, which drops false-valued scalar fields without presence. That means a client explicitly sending false cannot be distinguished from “unset”, so downstream request defaults are used instead of the caller’s override; in render flows this can flip prompt templating/tokenization behavior for valid inputs that require disabling these flags.

Useful? React with 👍 / 👎.

Changed add_generation_prompt, continue_final_message, and add_special_tokens to optional bool for presence awareness.

chatgpt-codex-connector · 2026-03-17T09:49:29Z

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py

+                new_key, fn = transforms[key]
+                result[new_key] = fn(value) if fn else value


Reject transformed-key collisions during proto mapping

_apply_transforms silently overwrites existing keys when a rename targets a key already present (for example content_parts -> content while content is also set in the same chat message). Because RenderChatRequest allows both fields (not a oneof), contradictory payloads are currently accepted with lossy, order-dependent data selection instead of returning INVALID_ARGUMENT, which can corrupt rendered prompts in malformed-but-possible client requests.

Useful? React with 👍 / 👎.

Current FIELD_TRANSFORMS mappings don't have source/target key overlap with proto field names, so collision can't occur. Will add a guard if the transform table grows.

…tProto Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

…Request to optional Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py`:
- Around line 28-40: Extend the _HTTP_TO_GRPC mapping to include less-common
upstream HTTP statuses by adding entries for 501 mapped to
grpc.StatusCode.UNIMPLEMENTED and 502 mapped to grpc.StatusCode.UNAVAILABLE so
that _http_to_grpc_status returns appropriate gRPC codes for backend "not
implemented" and gateway errors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1a31bcb-31f7-4b18-a904-0f53cacd9d6f

📥 Commits

Reviewing files that changed from the base of the PR and between 1878d7a and 7e8c18b.

📒 Files selected for processing (2)

crates/grpc_client/proto/vllm_render.proto
grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

hyeongyun0916 · 2026-03-20T09:45:46Z

@coderabbitai review

coderabbitai · 2026-03-20T09:45:54Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

hyeongyun0916 · 2026-03-20T12:40:03Z

@coderabbitai resolve

coderabbitai · 2026-03-20T12:40:19Z

✅ Actions performed

Comments resolved and changes approved.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92d9230b07

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T12:47:59Z

crates/grpc_client/proto/vllm_render.proto

+message RenderChatRequest {
+  repeated ChatCompletionMessage messages = 1;
+  string model = 2;


Preserve caller request_id across render requests

RenderChat/RenderCompletion deserialize these protos straight into vLLM request objects and then hand the returned GenerateRequestProto to the downstream generate path, but neither request message has a request_id field. In the render→generate flow that means vLLM must synthesize a new ID, so any caller that already assigned an ID for tracing or later cancellation loses it here; downstream Generate/Abort in grpc_servicer/smg_grpc_servicer/vllm/servicer.py key off request.request_id, so the rendered request can no longer be correlated or aborted by the original caller ID. The same omission exists on RenderCompletionRequest below.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T12:47:59Z

crates/grpc_client/proto/vllm_render.proto

+  optional bool add_generation_prompt = 6;
+  optional bool continue_final_message = 7;
+  optional int32 max_tokens = 8;
+  optional int32 max_completion_tokens = 9;
+  optional int32 truncate_prompt_tokens = 10;
+  optional bool add_special_tokens = 11;


Carry cache_salt through chat rendering

Fresh evidence beyond the earlier sampling-params discussion is that this file already treats cache_salt as part of the render contract for completions (RenderCompletionRequest has it on line 154) and the returned GenerateRequestProto also exposes cache_salt. Because RenderChat builds a ChatCompletionRequest directly from RenderChatRequest, omitting the field here means chat callers can never preserve their prefix-cache salt through the GPU-less render path; any deployment that relies on salted caching for tenant isolation or cache partitioning will silently send an unsalted generate request instead.

Useful? React with 👍 / 👎.

mergify · 2026-03-23T23:25:28Z

Hi @hyeongyun0916, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

chatgpt-codex-connector · 2026-03-24T05:56:06Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

mergify · 2026-03-25T00:17:43Z

Hi @hyeongyun0916, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

chatgpt-codex-connector · 2026-03-25T04:14:44Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

mergify · 2026-03-27T19:25:27Z

Hi @hyeongyun0916, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

chatgpt-codex-connector · 2026-03-31T09:13:02Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

chatgpt-codex-connector · 2026-04-01T11:21:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

slin1237

Thanks for putting this together. A few concerns before we can move forward:

Integration testing: All 51 tests mock openai_serving_render. The existing VllmEngineServicer has e2e coverage in our CI — this servicer should have at least one roundtrip test against a real vLLM render instance to validate the full proto → Pydantic → vLLM → proto path. Mock-only coverage doesn't catch serialization mismatches between the proto schema and vLLM's actual Pydantic models, which is the riskiest part of this approach.

Rebase: There are merge conflicts with main and quite a few incremental fixup commits. Can you rebase onto current main and squash the fixups? It's hard to review the final state cleanly.

Release sequencing: See inline comment on the version bumps — I'd prefer we decouple the proto release from the servicer release.

slin1237 · 2026-04-03T15:16:30Z

crates/grpc_client/proto/vllm_render.proto

@@ -0,0 +1,175 @@
+syntax = "proto3";


What's our contract for backward compatibility on this proto? Once we release smg-grpc-proto 0.5.0 on PyPI, we own this surface. DarkLight1337 noted that the renderer implementation in vLLM is "still not stabilized" — if openai_serving_render changes its request/response shapes, who updates this proto and the field transforms?

I think we need at minimum:

A version pinning strategy (which vLLM versions does this proto target?)

Clarity on who owns keeping the proto in sync when vLLM's render internals change

A deprecation/migration path if the proto needs breaking changes

Without this, we're signing up to maintain a moving-target compatibility layer.

slin1237 · 2026-04-03T15:16:30Z

crates/grpc_client/proto/vllm_render.proto

+  string request_id = 1;
+  repeated uint32 token_ids = 2;
+  google.protobuf.Struct sampling_params = 3;
+  optional string model = 4;


google.protobuf.Struct for sampling_params, features, stream_options, and kv_transfer_params means these are untyped JSON blobs at the proto level. This loses all type safety — a misspelled field or wrong type silently passes through and only fails deep inside vLLM.

Can we type sampling_params properly? It's the most commonly used field here and has a well-defined schema. If full typing is too much churn right now, at minimum add a doc comment listing the expected fields and types so consumers aren't guessing.

slin1237 · 2026-04-03T15:16:30Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

@@ -0,0 +1,150 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project


SPDX-FileCopyrightText: Copyright contributors to the vLLM project

This code is being contributed to the SMG repository under our license. Should this be Copyright contributors to the SMG project (or Lightseek), consistent with the rest of the codebase? If there's a CLA or licensing agreement for vLLM-originated code living in our repo, can you point to it?

slin1237 · 2026-04-03T15:16:30Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+            served_model_name=model_config.served_model_name or model_config.model,
+        )
+
+    async def GetServerInfo(self, request, context):


The split try/except blocks here create an issue: if render_chat_request succeeds but pydantic_to_proto fails in the second try block (line 71-77), the client gets INTERNAL even though the render work completed successfully. This means the render side-effects happened but the caller thinks it failed and may retry.

Consider merging into a single try block, or at least catching serialization errors with a more specific message so the caller knows the render succeeded but response encoding failed.

slin1237 · 2026-04-03T15:16:30Z

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py

+
+def from_proto(
+    message: Message,
+    request_class: type,


_apply_transforms recursively walks the entire dict tree on every single RPC call. For large multimodal requests with many content parts (e.g. multiple images, each with nested structures), this could add non-trivial overhead on top of the already expensive proto → dict → Pydantic → vLLM → Pydantic → proto round-trip.

Have you profiled this path? Even rough numbers (e.g. latency of transform for a 20-message chat with tool calls and multimodal content) would help us understand the cost.

slin1237 · 2026-04-03T15:16:30Z

grpc_servicer/pyproject.toml

 [project]
 name = "smg-grpc-servicer"
-version = "0.5.1"
+version = "0.6.0"


Bumping both smg-grpc-proto to 0.5.0 and smg-grpc-servicer to 0.6.0 in one PR couples the proto definition release with the servicer implementation release.

Can we split this into two PRs?

First PR: add vllm_render.proto to smg-grpc-proto 0.5.0 — this lets the vLLM side validate the proto contract independently

Second PR: add RenderGrpcServicer to smg-grpc-servicer 0.6.0 — depends on the published 0.5.0 proto

This gives us a cleaner release sequence and lets us iterate on the proto without re-releasing the servicer each time.

slin1237

A few more things I noticed after comparing with the existing VllmEngineServicer pattern:

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+    vllm_render_pb2_grpc,  # type: ignore[import-untyped]
+)
+from starlette.datastructures import State
+from vllm.entrypoints.openai.chat_completion.protocol import ChatCompletionRequest


The existing VllmEngineServicer uses type annotations on all RPC methods:

async def HealthCheck( self, request: vllm_engine_pb2.HealthCheckRequest, context: grpc.aio.ServicerContext, ) -> vllm_engine_pb2.HealthCheckResponse:

This servicer omits them on every method (async def HealthCheck(self, request, context)). Please add type annotations to match the existing pattern — it makes IDE navigation and static analysis work properly.

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+from vllm.entrypoints.openai.engine.protocol import ErrorResponse
+
+from smg_grpc_servicer.vllm.field_transforms import FIELD_TRANSFORMS
+from smg_grpc_servicer.vllm.proto_utils import from_proto, pydantic_to_proto


The GetModelInfo implementation here is a simplified copy of VllmEngineServicer.GetModelInfo but drops several fields the Rust gateway relies on: tokenizer_path, model_type, architectures, eos_token_ids, pad_token_id, bos_token_id, max_req_input_len.

If SMG's gateway connects to a render node and calls GetModelInfo, it will get an incomplete response compared to an engine node. This will either break downstream logic that expects those fields, or force us to special-case render vs engine responses.

Can this reuse or delegate to the same logic, or at minimum populate the same fields?

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+    422: grpc.StatusCode.INVALID_ARGUMENT,
+    429: grpc.StatusCode.RESOURCE_EXHAUSTED,
+    503: grpc.StatusCode.UNAVAILABLE,
+}


GetServerInfo hardcodes active_requests=0 and is_paused=False. If the render node is actually handling concurrent requests, these will be permanently wrong and misleading for any monitoring or load-balancing logic that reads them.

The engine servicer delegates to self.engine for these. Is there an equivalent on the render side, or should these fields be omitted rather than returning hardcoded lies?

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+import grpc
+from smg_grpc_proto import (
+    vllm_engine_pb2,  # type: ignore[import-untyped]
+    vllm_render_pb2,  # type: ignore[import-untyped]


The __init__ takes a raw starlette.datastructures.State and accesses .openai_serving_render, .vllm_config.model_config etc. by convention. This couples the servicer to Starlette's internal state bag shape — if vLLM changes how it structures its app state (which they do), this breaks silently.

The existing VllmEngineServicer.__init__ takes a typed EngineClient — much more resilient. Can this take explicit typed dependencies instead of reaching into an untyped state bag?

def __init__(self, serving_render, model_config, start_time: float):

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py

+    def __init__(self, state: State, start_time: float):
+        self.state = state
+        self.start_time = start_time
+


from_proto(request, ChatCompletionRequest, FIELD_TRANSFORMS) constructs a ChatCompletionRequest directly from the proto dict. But ChatCompletionRequest is a Pydantic model with validators that may expect fields the proto doesn't carry (or carries differently).

Have you tested this against the full set of ChatCompletionRequest validators? For example:

What happens when messages contains a tool-role message with content_parts instead of content?

Does the model field pass validation when it's an empty string (proto default for unset string)?

What about max_tokens vs max_completion_tokens mutual exclusion validation?

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py

+    raw = MessageToDict(message, preserving_proto_field_name=True)
+    if transforms:
+        return _apply_transforms(raw, transforms)
+    return raw


pydantic_to_proto uses model.model_dump(mode="json", exclude_none=True) then ParseDict. This silently drops any field that is None — but proto3 has defaults for unset fields (0 for ints, empty string for strings, false for bools). If the vLLM response intentionally sets a field to None to mean "not present", the proto consumer will see the default value instead, which has different semantics.

For example, GenerateRequestProto.priority — is None (don't set priority) different from 0 (priority zero)? With exclude_none=True, both become 0 on the wire.

slin1237 · 2026-04-03T15:22:44Z

crates/grpc_client/proto/vllm_render.proto

+message CompletionPromptTexts {
+  repeated string texts = 1;
+}
+


RenderChatRequest models a subset of ChatCompletionRequest fields. But the vLLM render endpoint also uses fields like frequency_penalty, presence_penalty, temperature, top_p, seed, response_format, guided_json, guided_regex — which affect tokenization and template rendering in some configurations.

How do you plan to handle these when they're needed? Adding them later is a proto-level breaking change (well, additive is OK, but the missing fields will silently get default values in the meantime). Would be good to document which ChatCompletionRequest fields are intentionally excluded and why.

slin1237 · 2026-04-03T15:22:44Z

grpc_servicer/tests/conftest.py

+
+
+@pytest.fixture
+def mock_grpc_context():


grpc.aio.AbortError("", "") — the constructor signature of AbortError is an internal implementation detail of grpcio. If grpcio updates the constructor (which has happened before), all tests break. The existing engine servicer tests don't mock context.abort this way.

Consider using a custom exception class for tests instead of depending on grpc.aio.AbortError's constructor.

mergify · 2026-04-03T18:26:18Z

Hi @hyeongyun0916, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

feat(render): implement VllmRender service with management and render…

e7cce3e

…ing RPCs Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

hyeongyun0916 requested review from CatherineSue and slin1237 as code owners March 17, 2026 07:44

hyeongyun0916 closed this Mar 17, 2026

hyeongyun0916 reopened this Mar 17, 2026

github-actions bot added dependencies Dependency updates tests Test changes labels Mar 17, 2026

test(render): add unit tests for VllmRender gRPC servicer

4da7a0b

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

hyeongyun0916 force-pushed the feat/vllm-render branch from a7c5574 to 4da7a0b Compare March 17, 2026 07:46

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

hyeongyun0916 added 3 commits March 17, 2026 07:55

test(render): update RenderChat and RenderCompletion tests to assert …

949d979

…AbortError handling Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(transform): add _parse_tool_choice function and corresponding tests

0f36685

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): handle ValueError and TypeError in RenderGrpcServicer m…

d86106a

…ethods Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

hyeongyun0916 mentioned this pull request Mar 17, 2026

[Frontend] Add gRPC server support for vllm launch render vllm-project/vllm#36102

Open

5 tasks

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

coderabbitai bot requested changes Mar 17, 2026

View reviewed changes

hyeongyun0916 added 6 commits March 17, 2026 09:24

feat(render): refactor ContentPart message to use oneof for payload t…

9d16e7b

…ypes Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(dependencies): add vllm to dev dependencies for testing

b041e73

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): update flatten_completion_prompt to raise ValueError fo…

7e5c31c

…r empty dict and unknown keys Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): update pydantic_to_proto to remove ignore_unknown_field…

197740b

…s parameter Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): enhance GetServerInfo and GetModelInfo methods to inclu…

9cd6eb2

…de served_model_name and additional server info Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): add logging for exceptions in RenderChat and RenderComp…

9ee771a

…letion methods Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

coderabbitai bot requested changes Mar 17, 2026

View reviewed changes

crates/grpc_client/proto/vllm_render.proto Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/proto_utils.py Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

hyeongyun0916 added 3 commits March 17, 2026 10:26

feat(render): change token_ids from int32 to uint32 in GenerateReques…

d88968e

…tProto Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): reorder import statements and move logger initialization

a28bc67

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

feat(render): change fields in RenderChatRequest and RenderCompletion…

ace448a

…Request to optional Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

coderabbitai bot requested changes Mar 20, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/render_servicer.py Show resolved Hide resolved

Merge branch 'main' into feat/vllm-render

92d9230

coderabbitai bot approved these changes Mar 20, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Mar 23, 2026

Merge commit '582c9e5f95e384d252053fe1b73eb35b8898773b' into feat/vll…

6b66ed5

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

mergify bot removed the needs-rebase PR has merge conflicts that need to be resolved label Mar 24, 2026

mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Mar 25, 2026

Merge commit 'ec102d40b72545aa3d6c83c7ea6b8103e559720c' into feat/vll…

dccac43

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

mergify bot removed the needs-rebase PR has merge conflicts that need to be resolved label Mar 25, 2026

mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Mar 27, 2026

Merge commit '04ca3dd8997b642cba143799cad811565bf2ee73' into feat/vll…

c268f10

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

hyeongyun0916 requested a review from njhill as a code owner March 31, 2026 09:12

mergify bot removed the needs-rebase PR has merge conflicts that need to be resolved label Mar 31, 2026

Merge commit '709e1f61550f5b6575ac55357de1773e8d97bb64' into feat/vll…

aa46d11

…m-render Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

slin1237 reviewed Apr 3, 2026

View reviewed changes

mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Apr 3, 2026

		except Exception as e:
		await context.abort(grpc.StatusCode.INTERNAL, str(e))

		new_key, fn = transforms[key]
		result[new_key] = fn(value) if fn else value

		@@ -0,0 +1,150 @@
		# SPDX-License-Identifier: Apache-2.0
		# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

Conversation

hyeongyun0916 commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Context

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

hyeongyun0916 commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading