Skip to content

v3 smart-router: complexity, cost/latency, hardware-aware routing #1

@ShockShoot

Description

@ShockShoot

ARC v2 is intentionally narrow and practical: intent/action-first routing, configured topic-to-model mapping, and final-model-aware signatures.

This issue tracks the broader v3 direction: a smart-router layer that can choose the best configured model/provider using more than just topic keywords.

Candidate signals

  • task/action type
  • subject/domain
  • prompt complexity
  • required reasoning depth
  • expected tool/subagent use
  • latency preference
  • cost preference
  • privacy/local-vs-cloud preference
  • available local hardware and runtimes
  • provider availability / fallback state

Candidate architecture

request context
  → feature extraction
      - action / intent
      - subject / topic
      - complexity score
      - privacy / locality signals
      - resource inventory
  → routing policy
      - user preferences
      - model capability registry
      - cost / latency budget
      - fallback and availability state
  → runtime override
      - model
      - provider
      - base_url / api_mode
  → final-model-aware signature

Hardware/resource awareness

A future Hermes-native router could query:

  • CPU / RAM
  • GPU / VRAM
  • installed local runtimes such as llama.cpp, vLLM, Ollama, LM Studio, etc.
  • configured remote providers
  • rough latency / cost / rate-limit availability

This enables policies such as:

  • simple private task → local small model if available
  • hard coding/math/reasoning → stronger configured model
  • cheap background task → lower-cost/free provider
  • no suitable local hardware → remote fallback

External router integration

Manifest-style systems can provide complexity or intelligence-level estimates. ARC should treat those as optional signals rather than replacing user-configured policy.

Possible shape:

external_router.score(prompt, context) -> {
  "complexity": 0.0-1.0,
  "recommended_tier": "small|medium|large",
  "latency_sensitive": bool,
  "privacy_sensitive": bool,
}

Non-goals for v2

Do not turn v2 into the full smart router. v2 should remain stable as a reference implementation for:

  • runtime model override
  • action-first routing
  • final-model-aware signatures
  • patch-free migration after upstream runtime override support lands

Open questions

  • Should complexity scoring be local, LLM-based, or external-router-based?
  • How should users describe model capability and cost metadata?
  • Should routing policy be declarative YAML, Python plugin code, or both?
  • How should router decisions be exposed in logs/signatures without making replies noisy?
  • How should provider rate limits and credential-pool state influence routing?

Related upstream discussion: NousResearch/hermes-agent#21827
Related runtime override PR: NousResearch/hermes-agent#23898

Design note in this repo: docs/V3_SMART_ROUTER.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions