-
Notifications
You must be signed in to change notification settings - Fork 45
feat(grpc): implement continuous Watch streaming for health servicers #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
V2arK
wants to merge
11
commits into
lightseekorg:main
Choose a base branch
from
V2arK:feat/grpc-watch-continuous-stream
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e05c360
chore(grpc): add test infrastructure for grpc_servicer
V2arK e6c2a8d
test(grpc): add SGLang Watch continuous stream tests (red)
V2arK 64abddb
test(grpc): add vLLM Watch continuous stream tests (red)
V2arK 5ff6087
feat(grpc): add HealthWatchMixin for continuous Watch streaming
V2arK 2a9f82c
fix(grpc): add _watch_notified_shutdown flag to HealthWatchMixin
V2arK 06935a6
feat(grpc): integrate HealthWatchMixin into SGLangHealthServicer
V2arK 5c4b466
feat(grpc): integrate HealthWatchMixin into VllmHealthServicer
V2arK 40a23f0
style(grpc): fix import sorting in Watch test files
V2arK 9adb736
chore(grpc): bump smg-grpc-servicer to 0.6.0
V2arK a79ac91
fix(grpc): address review feedback on Watch mixin
V2arK 73c24f2
test(grpc): exercise CancelledError path via Task.cancel() in Watch t…
V2arK File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| """Shared Watch() continuous streaming for gRPC health servicers.""" | ||
|
|
||
| import asyncio | ||
| import inspect | ||
| import logging | ||
| from collections.abc import AsyncIterator | ||
|
|
||
| import grpc | ||
| from grpc_health.v1 import health_pb2 | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class HealthWatchMixin: | ||
| """Continuous Watch() streaming for gRPC health servicers. | ||
|
|
||
| Implements the gRPC Health Checking Protocol Watch RPC as a long-lived | ||
| server-streaming response that sends updates on status change. | ||
|
|
||
| Subclasses must: | ||
| 1. Call self._init_watch() in __init__ | ||
| 2. Implement _compute_watch_status(service_name) -> ServingStatus | ||
| 3. Implement _is_shutting_down() -> bool | ||
| 4. Call self._notify_shutdown() in set_not_serving() | ||
| """ | ||
|
|
||
| WATCH_POLL_INTERVAL_S = 5.0 | ||
|
|
||
| def _init_watch(self) -> None: | ||
| """Initialize Watch state. Must be called in subclass __init__. | ||
|
|
||
| Note: on Python 3.10-3.11, asyncio.Event() captures the running | ||
| event loop at construction time. Both SGLang and vLLM construct | ||
| their servicers within the async server context (verified against | ||
| smg sglang/server.py and vllm grpc_server.py), so this is safe. | ||
| Python 3.12+ removed the loop binding entirely. | ||
| """ | ||
| self._watch_shutdown_event = asyncio.Event() | ||
| self._watch_notified_shutdown = False | ||
|
|
||
| def _notify_shutdown(self) -> None: | ||
| """Wake all Watch streams to detect shutdown immediately. | ||
| Must be called in subclass set_not_serving(). Sets | ||
| _watch_notified_shutdown so _is_shutting_down() implementations | ||
| can check it alongside their own shutdown flags.""" | ||
| self._watch_notified_shutdown = True | ||
| self._watch_shutdown_event.set() | ||
|
|
||
| def _compute_watch_status(self, service_name: str) -> int: | ||
| """Compute current health status for the given service. | ||
|
|
||
| Must not call context.set_code() -- that would pollute the | ||
| streaming response. Return a ServingStatus enum value instead. | ||
|
|
||
| May be overridden as async def for servicers that need I/O | ||
| (e.g., VllmHealthServicer calls await async_llm.check_health()). | ||
| """ | ||
| raise NotImplementedError(f"{type(self).__name__} must implement _compute_watch_status()") | ||
|
|
||
| def _is_shutting_down(self) -> bool: | ||
| """Return True if the server is shutting down.""" | ||
| raise NotImplementedError(f"{type(self).__name__} must implement _is_shutting_down()") | ||
|
|
||
| async def _resolve_watch_status(self, service_name: str) -> int: | ||
| """Call _compute_watch_status, handling both sync and async impls.""" | ||
| result = self._compute_watch_status(service_name) | ||
| if inspect.isawaitable(result): | ||
| return await result | ||
| return result | ||
|
|
||
| async def Watch( | ||
| self, | ||
| request: health_pb2.HealthCheckRequest, | ||
| context: grpc.aio.ServicerContext, | ||
| ) -> AsyncIterator[health_pb2.HealthCheckResponse]: | ||
| """gRPC Health Watch -- continuous streaming implementation. | ||
|
|
||
| Behavior per gRPC Health Checking Protocol: | ||
| - Immediately sends the current serving status | ||
| - Sends a new message whenever status changes | ||
| - Stream ends on server shutdown or client cancellation | ||
|
|
||
| Deviation from spec: for unknown services, the stream sends | ||
| SERVICE_UNKNOWN once then exits. The spec says to keep the stream | ||
| open for dynamic service registration, but smg services are | ||
| statically defined and never registered at runtime. | ||
| """ | ||
| service_name = request.service | ||
| logger.debug("Health watch request for service: '%s'", service_name) | ||
|
|
||
| last_status = None | ||
| try: | ||
| while True: | ||
| status = await self._resolve_watch_status(service_name) | ||
|
|
||
| if status != last_status: | ||
| yield health_pb2.HealthCheckResponse(status=status) | ||
| last_status = status | ||
|
|
||
| if self._is_shutting_down(): | ||
| return | ||
|
|
||
| if status == health_pb2.HealthCheckResponse.SERVICE_UNKNOWN: | ||
| return | ||
|
|
||
| try: | ||
| await asyncio.wait_for( | ||
| self._watch_shutdown_event.wait(), | ||
| timeout=self.WATCH_POLL_INTERVAL_S, | ||
| ) | ||
| except asyncio.TimeoutError: | ||
| pass | ||
|
|
||
| except asyncio.CancelledError: | ||
| logger.debug( | ||
| "Health watch cancelled by client for service: '%s'", | ||
| service_name, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This early return can drop a real status transition during shutdown: if
set_not_serving()runs afterstatuswas computed for the current iteration but before this check executes, the stream exits immediately without sending the finalNOT_SERVINGupdate. In that race window, Watch clients only observe EOF and miss the health-state change event they rely on for routing decisions.Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluated -- pushing back on this one.
The race requires
set_not_serving()to fire between_resolve_watch_status()returning and_is_shutting_down()executing. In asyncio's single-threaded cooperative model:_compute_watch_status()is sync -- no yield point, race is impossible.await check_health(), but the window between the return of_resolve_watch_status()and the_is_shutting_down()check is a single Python statement with noawait-- no coroutine switch can happen there.Even in the theoretical case where shutdown lands between
_resolve_watch_statusyielding control (duringcheck_health()) and the shutdown check: the client would seeSERVINGthen EOF. Any conformant Watch client treats EOF as "server unavailable" -- the gRPC transport-level disconnect is the primary shutdown signal, not an in-bandNOT_SERVINGmessage.Not worth the added complexity.