Skip to content

feat: centralized provider router, call_llm API, unified /model command#1003

Merged
teknium1 merged 12 commits intomainfrom
hermes/hermes-cf9f7d54
Mar 12, 2026
Merged

feat: centralized provider router, call_llm API, unified /model command#1003
teknium1 merged 12 commits intomainfrom
hermes/hermes-cf9f7d54

Conversation

@teknium1
Copy link
Contributor

@teknium1 teknium1 commented Mar 12, 2026

Summary

Complete provider routing infrastructure overhaul. Every LLM call now flows through a centralized router that handles auth, request formatting, and provider-specific quirks in one place.

Core Architecture

resolve_provider_client(provider, model, async_mode, raw_codex)

Central entry point for creating LLM clients. Handles:

  • Auth lookup (env vars for API-key providers, OAuth tokens for Nous/Codex)
  • Base URL resolution with env var overrides
  • API format (Chat Completions vs Responses API adapter for Codex)
  • Provider-specific headers (OpenRouter attribution, Kimi User-Agent)
  • raw_codex mode for main agent (direct responses.stream access)

call_llm(task, provider, model, messages, ...) / async_call_llm(...)

Full request lifecycle:

  1. Resolve provider+model from task config or explicit args
  2. Get/create cached client via router
  3. Build request kwargs (max_tokens handling, provider extra_body)
  4. Make API call with max_tokens/max_completion_tokens retry
  5. Return response

Config slots (config.yaml v7)

Every auxiliary task has a provider:model config slot:

auxiliary:
  compression: {provider: auto, model: ""}
  vision: {provider: auto, model: ""}
  web_extract: {provider: auto, model: ""}
  session_search: {provider: auto, model: ""}
  skills_hub: {provider: auto, model: ""}
  mcp: {provider: auto, model: ""}
  flush_memories: {provider: auto, model: ""}

Changes by Area

Auxiliary Consumers Migrated (all use call_llm/async_call_llm)

  • context_compressor.py — compression summaries
  • vision_tools.py — image analysis (fixed Codex bypass)
  • web_tools.py — page summarization
  • session_search_tool.py — session summarization
  • browser_tool.py — snapshot summarization + browser vision
  • mcp_tool.py — MCP sampling
  • skills_guard.py — skill audit
  • run_agent.py flush_memories — memory flush
  • trajectory_compressor.py — trajectory summarization
  • mini_swe_runner.py — SWE benchmark runner
  • openrouter_client.py — shared OpenRouter client

Main Agent (run_agent.py)

  • __init__: Uses router when no explicit creds provided
  • _try_activate_fallback: Replaced duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with single resolve_provider_client() call
  • Removed _resolve_fallback_credentials method

Provider Fixes

  • Nous Portal: Don't send OpenRouter provider preferences (only, ignore, etc.) — caused 404. Don't send reasoning: {enabled: false} — Nous requires reasoning enabled
  • Codex vision bypass: vision_tools.py was constructing raw AsyncOpenAI bypassing Codex adapter → crash. Fixed with get_async_vision_auxiliary_client()
  • Vision errors: Clear message when model doesn't support vision

Config & Auth

  • Removed LLM_MODEL env var — config.yaml is sole source of truth. Avoids conflicts in multi-agent setups. Removed from cli.py, auth.py, setup.py, gateway, cron
  • Removed nous-api provider — Nous Portal is OAuth only

UX

  • Unified /model and /provider — both show same view with all authenticated providers, their models, current selection, and switch examples
  • Added curated model lists for Nous Portal and OpenAI Codex

Test Results

3251 passed, 2 pre-existing unrelated failures

Live Testing

  • Tested hermes model switching between OpenRouter, Nous Portal, OpenAI Codex
  • Tested /model provider:model mid-conversation switching (history preserved)
  • Tested tool calls, vision, web search across all providers
  • Tested auxiliary routing (vision/compression/web_extract) with each primary provider

…error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886
Route all remaining ad-hoc auxiliary LLM call sites through
resolve_provider_client() so auth, headers, and API format (Chat
Completions vs Responses API) are handled consistently in one place.

Files changed:

- tools/openrouter_client.py: Replace manual AsyncOpenAI construction
  with resolve_provider_client('openrouter', async_mode=True). The
  shared client module now delegates entirely to the router.

- tools/skills_guard.py: Replace inline OpenAI client construction
  (hardcoded OpenRouter base_url, manual api_key lookup, manual
  headers) with resolve_provider_client('openrouter'). Remove unused
  OPENROUTER_BASE_URL import.

- trajectory_compressor.py: Add _detect_provider() to map config
  base_url to a provider name, then route through
  resolve_provider_client. Falls back to raw construction for
  unrecognized custom endpoints.

- mini_swe_runner.py: Route default case (no explicit api_key/base_url)
  through resolve_provider_client('openrouter') with auto-detection
  fallback. Preserves direct construction when explicit creds are
  passed via CLI args.

- agent/auxiliary_client.py: Fix stale module docstring — vision auto
  mode now correctly documents that Codex and custom endpoints are
  tried (not skipped).
@teknium1 teknium1 changed the title feat: centralized provider router + fix Codex vision bypass + vision error handling feat: centralized provider router + route all auxiliary LLM calls through it Mar 12, 2026
Nous Portal only supports OAuth authentication. Remove the 'nous-api'
provider which allowed direct API key access via NOUS_API_KEY env var.

Removed from:
- hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases
- hermes_cli/config.py: OPTIONAL_ENV_VARS entry
- hermes_cli/setup.py: setup wizard option + model selection handler
  (reindexed remaining provider choices)
- agent/auxiliary_client.py: docstring references
- tests/test_runtime_provider_resolution.py: nous-api test
- tests/integration/test_web_tools.py: renamed dict key
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.

- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
  multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)

3251 passed, 2 pre-existing unrelated failures.
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.
…ource of truth

Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.

Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.

Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
  to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
  calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
  for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
  model detection
Two bugs in _build_api_kwargs that broke Nous Portal:

1. Provider preferences (only, ignore, order, sort) are OpenRouter-
   specific routing features. They were being sent in extra_body to ALL
   providers, including Nous Portal. When the config had
   providers_only=['google-vertex'], Nous Portal returned 404 'Inference
   host not found' because it doesn't have a google-vertex backend.

   Fix: Only include provider preferences when _is_openrouter is True.

2. Reasoning config with enabled=false was being sent to Nous Portal,
   which requires reasoning and returns 400 'Reasoning is mandatory for
   this endpoint and cannot be disabled.'

   Fix: Omit the reasoning parameter for Nous when enabled=false.

Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact
request payload being sent to Nous Portal's inference API.
Nous Portal backend will become a transparent proxy for OpenRouter-
specific parameters (provider preferences, etc.), so keep sending them
to all providers. The reasoning disabled fix is kept (that's a real
constraint of the Nous endpoint).
Both /model and /provider now show the same unified display:

  Current: anthropic/claude-opus-4.6 via OpenRouter

  Authenticated providers & models:
    [openrouter] ← active
      anthropic/claude-opus-4.6 ← current
      anthropic/claude-sonnet-4.5
      ...
    [nous]
      claude-opus-4-6
      gemini-3-flash
      ...
    [openai-codex]
      gpt-5.2-codex
      gpt-5.1-codex-mini
      ...

  Not configured: Z.AI / GLM, Kimi / Moonshot, ...

  Switch model:    /model <model-name>
  Switch provider: /model <provider>:<model-name>
  Example: /model nous:claude-opus-4-6

Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.

Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
@teknium1 teknium1 changed the title feat: centralized provider router + route all auxiliary LLM calls through it feat: centralized provider router, call_llm API, unified /model command Mar 12, 2026
- Custom endpoints can serve any model, so skip validation for
  provider='custom' in validate_requested_model(). Previously it
  would reject any model name since there's no static catalog or
  live API to check against.
- Show clear setup instructions when switching to custom endpoint
  without OPENAI_BASE_URL/OPENAI_API_KEY configured.
- Added curated model lists for Nous Portal and OpenAI Codex to
  _PROVIDER_MODELS so /model shows their available models.
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
  it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
@teknium1 teknium1 merged commit 9cb9d1a into main Mar 12, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant