v3 smart-router: complexity, cost/latency, hardware-aware routing

ARC v2 is intentionally narrow and practical: intent/action-first routing, configured topic-to-model mapping, and final-model-aware signatures.

This issue tracks the broader v3 direction: a smart-router layer that can choose the best configured model/provider using more than just topic keywords.

## Candidate signals

- task/action type
- subject/domain
- prompt complexity
- required reasoning depth
- expected tool/subagent use
- latency preference
- cost preference
- privacy/local-vs-cloud preference
- available local hardware and runtimes
- provider availability / fallback state

## Candidate architecture

```text
request context
  → feature extraction
      - action / intent
      - subject / topic
      - complexity score
      - privacy / locality signals
      - resource inventory
  → routing policy
      - user preferences
      - model capability registry
      - cost / latency budget
      - fallback and availability state
  → runtime override
      - model
      - provider
      - base_url / api_mode
  → final-model-aware signature
```

## Hardware/resource awareness

A future Hermes-native router could query:

- CPU / RAM
- GPU / VRAM
- installed local runtimes such as llama.cpp, vLLM, Ollama, LM Studio, etc.
- configured remote providers
- rough latency / cost / rate-limit availability

This enables policies such as:

- simple private task → local small model if available
- hard coding/math/reasoning → stronger configured model
- cheap background task → lower-cost/free provider
- no suitable local hardware → remote fallback

## External router integration

Manifest-style systems can provide complexity or intelligence-level estimates. ARC should treat those as optional signals rather than replacing user-configured policy.

Possible shape:

```text
external_router.score(prompt, context) -> {
  "complexity": 0.0-1.0,
  "recommended_tier": "small|medium|large",
  "latency_sensitive": bool,
  "privacy_sensitive": bool,
}
```

## Non-goals for v2

Do not turn v2 into the full smart router. v2 should remain stable as a reference implementation for:

- runtime model override
- action-first routing
- final-model-aware signatures
- patch-free migration after upstream runtime override support lands

## Open questions

- Should complexity scoring be local, LLM-based, or external-router-based?
- How should users describe model capability and cost metadata?
- Should routing policy be declarative YAML, Python plugin code, or both?
- How should router decisions be exposed in logs/signatures without making replies noisy?
- How should provider rate limits and credential-pool state influence routing?

Related upstream discussion: https://github.com/NousResearch/hermes-agent/issues/21827
Related runtime override PR: https://github.com/NousResearch/hermes-agent/pull/23898

Design note in this repo: `docs/V3_SMART_ROUTER.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3 smart-router: complexity, cost/latency, hardware-aware routing #1

Candidate signals

Candidate architecture

Hardware/resource awareness

External router integration

Non-goals for v2

Open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v3 smart-router: complexity, cost/latency, hardware-aware routing #1

Description

Candidate signals

Candidate architecture

Hardware/resource awareness

External router integration

Non-goals for v2

Open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions