-
Notifications
You must be signed in to change notification settings - Fork 8
ec connector stats #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.11.0
Are you sure you want to change the base?
ec connector stats #164
Conversation
Signed-off-by: wuhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds initial support for EC (Encoder Cache) Connector statistics collection and reporting. The implementation introduces a metrics infrastructure similar to the existing KV Connector stats, enabling monitoring of encoder cache load/save operations and their performance characteristics.
- Adds
ECConnectorStatsbase class andMooncakeECConnectorStatsimplementation for tracking load/save metrics - Integrates stats collection into the EC connector flow from worker to scheduler to stats reporting
- Updates the scheduler to include EC connector stats alongside existing KV connector stats
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm/distributed/ec_transfer/ec_connector/metrics.py | New file defining ECConnectorStats base class with abstract methods for reset, aggregate, reduce, and is_empty |
| vllm/distributed/ec_transfer/ec_connector/base.py | Adds abstract get_stats() method to ECConnectorBase |
| vllm/distributed/ec_transfer/ec_connector/mooncake_storage_connector.py | Implements MooncakeECConnectorStats with load timing tracking and stats aggregation; wraps load operations with timing context manager |
| vllm/v1/outputs.py | Adds ec_connector_stats field to ECConnectorOutput; reformats EMPTY_MODEL_RUNNER_OUTPUT for readability |
| vllm/v1/worker/ec_connector_model_runner_mixin.py | Collects stats from EC connector after model execution |
| vllm/v1/core/sched/scheduler.py | Threads EC connector stats through scheduler's update_from_output() and make_stats() methods |
| vllm/v1/metrics/stats.py | Adds ec_connector_stats field to SchedulerStats |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @abstractmethod | ||
| def get_stats(self) -> Any: | ||
| """ | ||
| Get the statistics of the connector. | ||
| Returns: | ||
| Statistics object. | ||
| """ | ||
| pass |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_stats method is marked as @abstractmethod but ECSharedStorageConnector (in shared_storage_connector.py) does not implement this method. This will cause instantiation failures for ECSharedStorageConnector. Either remove the @abstractmethod decorator to make it optional, or ensure all subclasses implement this method.
| @dataclass | ||
| class ECConnectorStats: | ||
| """ | ||
| Base class for EC Connector Stats, a container for transfer performance |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace at the end of the line. Consider removing it for consistent code style.
| Base class for EC Connector Stats, a container for transfer performance | |
| Base class for EC Connector Stats, a container for transfer performance |
| class ECConnectorStats: | ||
| """ | ||
| Base class for EC Connector Stats, a container for transfer performance | ||
| metrics or otherwise important telemetry from the connector. |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace at the end of the line. Consider removing it for consistent code style.
| metrics or otherwise important telemetry from the connector. | |
| metrics or otherwise important telemetry from the connector. |
|
|
||
| def reduce(self) -> dict[str, Union[int, float]]: | ||
| """ | ||
| Reduce the observations collected during a time interval to one or |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace at the end of the line. Consider removing it for consistent code style.
| Reduce the observations collected during a time interval to one or | |
| Reduce the observations collected during a time interval to one or |
| def reduce(self) -> dict[str, Union[int, float]]: | ||
| """ | ||
| Reduce the observations collected during a time interval to one or | ||
| more representative values (eg avg/median/sum of the series). |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace at the end of the line. Consider removing it for consistent code style.
| more representative values (eg avg/median/sum of the series). | |
| more representative values (eg avg/median/sum of the series). |
| def aggregate(self, other: ECConnectorStats) -> ECConnectorStats: | ||
| if not other.is_empty(): | ||
| self.data["load_time_ms"] += other.data["load_time_ms"] | ||
| self.data["save_time_ms"] += other.data["save_time_ms"] | ||
| self.data["num_loads"] += other.data["num_loads"] | ||
| self.data["num_saves"] += other.data["num_saves"] | ||
| return self |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aggregate method accepts ECConnectorStats but directly accesses keys specific to MooncakeECConnectorStats (e.g., other.data["load_time_ms"]). If a different ECConnectorStats subclass is passed, this will raise a KeyError. Consider either:
- Type-checking with
isinstance(other, MooncakeECConnectorStats)before accessing the keys, or - Changing the parameter type to
"MooncakeECConnectorStats"to make the expectation explicit.
Purpose
Initial support EC Connector stats
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.