Skip to content

Conversation

@chickeyton
Copy link
Collaborator

@chickeyton chickeyton commented Nov 24, 2025

Add balancer and the corresponding platform connectors

Add new balancer and connectors from llm-balancer

  • Balancer [tested]: The high level balancer interface providing configurations & SLO management, request load balancing functions

  • Request Routers:

    • Random router [tested]
    • RR router [tested]
    • Queue Length router [tested]
    • Prefill (i.e. TTFT) router [tested]
    • Encode router
    • Decode router [tested]
    • Kvaware router
  • Dynamic P/D [tested]: for dynamic P/D role switching base on the configurated SLO and realtime statistics

  • Connectors:

    • vllm integrations on KV cache awareness & instance config
    • lmcache integrations on KV cache awareness
  • Batch Routing: Route a collection of tasks once by using greedy local search optimization algo to maximize the load balancing effect, supported by all routers listed above

  • vllm plugin : A vllm plugin package required for the vllm integrations on KV cache awareness

For the usage examples:

How to create and config Balancer and start a Http proxy server for requests
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/api/http/app.py
https://github.com/chickeyton/llm-balancer/tree/workload_router/examples/http/p_d (configs for P/D Disagg)
https://github.com/chickeyton/llm-balancer/tree/workload_router/examples/http/pd (configs for Mixed Mode)
https://github.com/chickeyton/llm-balancer/tree/workload_router/examples/http/dynamic_pd (configs for Dynamic P/D)
https://github.com/chickeyton/llm-balancer/tree/workload_router/examples/http/batched_p_d (configs for Batch Routing)

How to actually handle chat/completions requests with Balancer on P/D disagg(with Dynamic P/D, Batch Routing), Mixed mode (with Batch Routing)
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/api/http/pipeline/p_d.py
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/api/http/pipeline/pd.py
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/api/http/pipeline/utils.py
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/api/http/pipeline/pipeline.py

How to implement custom LLM service discovery
https://github.com/chickeyton/llm-balancer/blob/workload_router/examples/balancer/basic_usage.py (RedisEndpointTracker)

How to implement custom KV cache awareness
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/connectors/vllm/kv_connector.py
https://github.com/chickeyton/llm-balancer/blob/workload_router/llm_balancer/connectors/vllm/kv_cache_tracker.py

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test improvements
  • CI/CD improvements

Related Issues

Changes Made

Testing

  • Existing tests pass
  • New tests added (if applicable)
  • Manual testing performed

Test Coverage

Documentation

  • Documentation updated (if needed)
  • Code comments added/updated
  • API documentation updated (if applicable)

Checklist

  • I have read the CONTRIBUTING guidelines
  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have signed off my commits (DCO)

Screenshots/Output

Additional Notes

Reviewer Checklist

  • Code quality and style
  • Test coverage adequate
  • Documentation updated
  • Performance considerations reviewed
  • Security implications considered
  • Breaking changes documented

Signed-off-by: chickeyton <[email protected]>
self.is_endpoint_up = True

def __init__(self, tracker: EndpointTracker, preserve_down_records: bool = False):
super(Thread, self).__init__()

Check failure

Code scanning / CodeQL

First argument to super() is not enclosing class Error

First argument to super() should be VllmKvCacheTracker.
@@ -0,0 +1,26 @@
from dataclasses import dataclass

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'dataclass' is not used.
from dataclasses import dataclass

from ...balancer import EndpointConfig, Endpoint
from openai import AsyncOpenAI, OpenAI

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'OpenAI' is not used.
Signed-off-by: chickeyton <[email protected]>
@chickeyton chickeyton changed the title add balancer [Feature][Algo] Add Balancer Nov 25, 2025
Signed-off-by: chickeyton <[email protected]>
Signed-off-by: chickeyton <[email protected]>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant