feat: add LiteLLM as AI gateway backend by RheagalFire · Pull Request #18 · fluxions-ai/vui

RheagalFire · 2026-06-15T19:24:25Z

Summary

Add LiteLLM as a third LLM backend alongside Ollama and vLLM, enabling access to 100+ cloud LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex, Groq, etc.) through the LiteLLM proxy.

Currently vui only supports local inference (Ollama, vLLM). This adds cloud provider support via VUI_LLM_BACKEND=litellm.

Changes

src/vui/serving/stream/llm_backend.py
- Added LiteLLMBackend class with stream(), complete(), list_models(), set_model()
- Uses OpenAI-compatible /v1/chat/completions endpoint
- Only sends provider-safe params (temperature, max_tokens) to avoid cross-provider rejection
- Auto-discovers available models via /v1/models
- Added litellm branch in make_backend() factory
- Config: VUI_LLM_BACKEND=litellm, VUI_LITELLM_URL, VUI_LITELLM_MODEL
tests/test_litellm_backend.py - 5 unit tests

Tests

Unit tests (5/5 pass):

test_make_backend_litellm PASSED
test_litellm_body_includes_drop_params PASSED
test_litellm_body_with_tools PASSED
test_litellm_default_model PASSED
test_litellm_set_model PASSED

Live E2E (LiteLLM proxy -> Claude Sonnet via Azure AI Foundry):

Response: 4
Usage: {'prompt': 18, 'completion': 5, 'ctx_used': 23, 'ctx_max': 0}
Stream: Hi|!
Models: ['anthropic/claude-sonnet-4-6']

Complete, streaming, and model discovery all verified end-to-end.

Example usage

# Start LiteLLM proxy
pip install litellm
litellm --model anthropic/claude-sonnet-4-6

# Run vui with LiteLLM backend
VUI_LLM_BACKEND=litellm VUI_LITELLM_MODEL=anthropic/claude-sonnet-4-6 vui serve

from vui.serving.stream.llm_backend import make_backend

backend = make_backend("litellm", model="anthropic/claude-sonnet-4-6")

# Non-streaming
result = await backend.complete(
    [{"role": "user", "content": "Hello!"}],
    max_tokens=100,
)
print(result["content"])

# Streaming
async for token in backend.stream(
    [{"role": "user", "content": "Tell me a story"}],
    max_tokens=200,
):
    print(token, end="", flush=True)

…mpat

mogwai

Thanks for this — nicely tested and it works end-to-end. Before merge I'd like to reshape it a bit, because a LiteLLM proxy is just an OpenAI-compatible /v1/chat/completions endpoint, and vui already has an OpenAI-compatible client: VLLMBackend. As written, LiteLLMBackend re-implements ~110 lines of it (stream, complete, list_models, _record_stats are near byte-identical), which means future fixes have to be made in two places.

There are only three genuine differences between vLLM-local and a cloud gateway, so this can subclass VLLMBackend and override just those:

class LiteLLMBackend(VLLMBackend):
    """OpenAI-compatible cloud gateway (LiteLLM proxy -> 100+ providers).

    Same wire protocol as vLLM; differs only in dropping vLLM-only body
    knobs, skipping prefill (no local KV to warm), and supplying auth.
    Run the proxy with `drop_params: true` to let LiteLLM strip params
    individual providers don't support.
    """

    name = "litellm"

    def __init__(
        self,
        model: str = "openai/gpt-4o-mini",
        base_url: str = "http://localhost:4000",
        *,
        sampling: dict | None = None,
    ):
        super().__init__(model=model, base_url=base_url, sampling=sampling)

    def _client_inst(self) -> httpx.AsyncClient:
        client = super()._client_inst()
        key = os.environ.get("VUI_LITELLM_KEY")
        if key:
            client.headers["Authorization"] = f"Bearer {key}"
        return client

    def _body(self, messages, **kw) -> dict:
        body = super()._body(messages, **kw)
        body.pop("top_k", None)                  # not OpenAI-standard
        body.pop("chat_template_kwargs", None)   # vLLM-specific
        return body

    async def prefill(self, messages) -> None:
        return  # remote provider: no local KV to warm; a real call would just bill

    async def set_model(self, name: str) -> None:
        self.model = name

ctx_max then inherits vLLM's 8192 default, which keeps the context-fill gauge meaningful.

The substantive issues this addresses:

1. (blocker) prefill bills the cloud provider on a hot path. The inherited base prefill does complete(max_tokens=1) — a real round-trip. It's called in llm.py, voice_turn.py and thoughts.py, and per the comment on _client_inst spec-prefill fires "every few hundred ms during user speech." For Ollama/vLLM that's free local KV warming; against a cloud provider it's a billed request every few hundred ms with no benefit (there's no warmable KV behind a proxy). Needs the no-op override above.

2. (blocker) No auth. _client_inst() sends no Authorization header and there's no key env var, so the localhost:4000 default only works against an unauthenticated local proxy. Any real setup — a proxy with master_key/virtual keys, a remote proxy, or a provider directly — 401s. The VUI_LITELLM_KEY override above fixes this; please also document it (the README/docstring currently mention only VUI_LITELLM_URL and VUI_LITELLM_MODEL).

3. Provider param compatibility belongs in the proxy. Rather than hand-omitting params on the client (and the _resolve_sampling() call currently computes top_k/top_p/presence_penalty then discards them), drop only the two non-OpenAI knobs (top_k, chat_template_kwargs) and let LiteLLM's drop_params: true strip the rest per-provider — that's what it's for.

Minor: test_litellm_body_includes_drop_params is misleadingly named — there's no drop_params in the body; it actually asserts presence_penalty is absent. The subclass keeps your existing make_backend branch and tests working as-is (the body assertions still hold).

RheagalFire added 2 commits June 16, 2026 00:44

feat: add LiteLLM as AI gateway backend

734b778

fix: strip provider-unsupported sampling params for cross-provider co…

6f6dd4a

…mpat

mogwai reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LiteLLM as AI gateway backend#18

feat: add LiteLLM as AI gateway backend#18
RheagalFire wants to merge 2 commits into
fluxions-ai:mainfrom
RheagalFire:feat/add-litellm-provider

RheagalFire commented Jun 15, 2026

Uh oh!

mogwai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RheagalFire commented Jun 15, 2026

Summary

Changes

Tests

Example usage

Uh oh!

mogwai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants