Motor is an LLM routing engine that dynamically selects the best language model for each task. Instead of routing every request to the most expensive model, Motor analyzes incoming prompts, classifies their complexity and requirements, and dispatches them to the cheapest model capable of handling the job.
Motor runs every prompt through a four-stage pipeline:
Prompt → Analyzer → Router → Executor → Evaluator
- Analyzer — Scores prompt complexity (0.0–1.0) using reasoning keywords, tool hints, multi-step markers, and token length.
- Router — Maps the analysis to a routing decision: model tier, specific model, and token limits.
- Executor — Calls the selected model with streaming and tool-call loop support.
- Evaluator — Scores the response confidence and flags issues (truncation, refusals, length mismatches).
Models are grouped into three tiers based on capability and cost:
| Tier | Purpose | Models |
|---|---|---|
token_safe |
Simple, factual, short tasks | claude-haiku-4-5, gpt-4o-mini |
balanced |
Most everyday tasks | claude-sonnet-4-6, gpt-4o, o3-mini |
performance |
Complex reasoning, architecture | claude-opus-4-6, o1 |
Tier selection is driven by complexity score thresholds (configurable in src/config/settings.py):
- Score < 0.25 →
token_safe - Score 0.25–0.70 →
balanced - Score ≥ 0.70 →
performance
| Mode | Behavior |
|---|---|
token_safe |
Always routes to cheapest capable model; hard cap of 1024 output tokens |
balanced |
Adaptive routing by complexity; default mode |
performance |
Always routes to the highest-tier model; no output limits |
Pass --mode at the CLI or set default_mode in settings to switch modes.
Motor tracks per-model success rates and latency in memory. A model with an error rate ≥ 35% (after at least 3 calls) is automatically demoted — the router skips it and picks the next healthy candidate. Models with high average latency get a cost penalty that pushes the router toward faster alternatives. Recovery is automatic when error rates improve.
Requirements: Python ≥ 3.11
git clone https://github.com/VladOS95-cyber/motor.git
cd motor
# Using uv (recommended)
uv sync
# With LangGraph integration
uv sync --extra langgraph
# Or using pip
pip install -r requirements.txt
pip install "motor[langgraph]" # with LangGraph integrationSet your API keys:
export ANTHROPIC_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_hereSingle prompt:
python main.py "Explain how transformers work"With mode and verbose output:
python main.py --mode performance -v "Design a distributed caching system"Disable streaming:
python main.py --no-stream "Write a regex to match email addresses"Interactive REPL (blank line to submit, Ctrl-C to exit):
python main.pypositional:
prompt Prompt text (omit to enter REPL mode)
options:
--mode MODE Routing mode: token_safe | balanced | performance (default: balanced)
--no-stream Disable streaming output
-v, --verbose Show full routing and evaluation details
Motor exposes a FastAPI server for programmatic access. All routing logic and health-aware selection work identically to the CLI.
Start the server:
uvicorn src.api.app:app --reloadInteractive docs are available at http://localhost:8000/docs once the server is running.
| Method | Path | Description |
|---|---|---|
POST |
/analyze |
Complexity analysis only — returns signals used by the router, no LLM call |
POST |
/route |
Analyze + select best model — returns tier, reason, cost info, no LLM call |
POST |
/execute |
Full pipeline: analyze → route → execute → evaluate |
GET |
/models |
All models in the registry, sorted by tier then cost |
GET |
/models/tier/{tier} |
Models filtered to a single tier |
GET |
/health |
Live model health snapshot: error rates, latency, availability |
POST /analyze, /route:
{
"prompt": "Your prompt here",
"mode": "balanced"
}POST /execute additionally accepts:
{
"prompt": "Your prompt here",
"mode": "balanced",
"system": "Optional system prompt",
"max_tokens": null
}mode defaults to "balanced". Valid values: "token_safe", "balanced", "performance".
curl -s -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"prompt": "Debug this Python function and explain the error", "mode": "balanced"}' \
| python -m json.tool{
"model": {
"id": "claude-sonnet-4-6",
"tier": "balanced",
"cost_per_1k_input": 0.003,
...
},
"tier": "balanced",
"reason": "complexity=0.43 | mode=balanced | tier=balanced | keywords=['debug']",
"max_tokens": null,
"analysis": {
"complexity_score": 0.43,
"is_multi_step": false,
"tool_hints": [],
"reasoning_keywords": ["debug"]
}
}curl -s -X POST http://localhost:8000/execute \
-H "Content-Type: application/json" \
-d '{"prompt": "What is 2 + 2?", "mode": "token_safe"}' \
| python -m json.toolThe /execute response always includes analysis and evaluation blocks alongside the model response, so you can see which model was used, its cost, and confidence score.
The evaluation block reports confidence_score, flags (e.g. truncated_output, possible_refusal), and cost_usd.
Prompt: "Design a distributed caching system with eviction policies."
[analyzer] complexity=0.74 tokens≈12 multi_step=False tools=— keywords=['design']
[router] claude-opus-4-6 tier=performance provider=anthropic
complexity=0.74 | mode=balanced | tier=performance | keywords=['design']
... streamed response ...
[stats] in=512 out=1240 latency=3100ms finish=stop
[eval] confidence=1.00 cost=$0.021480
Motor can be used directly inside a LangGraph graph. Install the extra dependencies first:
uv sync --extra langgraphThe simplest integration — Motor runs its full pipeline (analyze → route → execute) as a single graph node:
from langgraph.graph import StateGraph, MessagesState
from src.integrations.langgraph import MotorNode
graph = StateGraph(MessagesState)
graph.add_node("motor", MotorNode())
graph.set_entry_point("motor")
graph.set_finish_point("motor")
app = graph.compile()
result = await app.ainvoke({"messages": [{"role": "user", "content": "Explain RLHF"}]})MotorNode is an async callable class. It accepts LangChain message objects (HumanMessage, AIMessage, etc.) directly from state.
from langchain_community.tools import DuckDuckGoSearchRun
from src.integrations.langgraph import MotorNode
graph.add_node("motor", MotorNode(tools=[DuckDuckGoSearchRun()]))LangChain BaseTool objects are automatically adapted to Motor's tool-executor interface.
Use MotorRouter to decide which specialised node runs next without executing any LLM call:
from src.integrations.langgraph import MotorRouter
graph.add_conditional_edges("entry", MotorRouter(), {
"token_safe": "cheap_node",
"balanced": "standard_node",
"performance": "reasoning_node",
})MotorRouter is a synchronous callable class that analyses the last user message and returns the tier name ("token_safe" / "balanced" / "performance").
For production use, create a single ModelRegistry and pass it to both so health tracking is shared across calls:
from src.registry.registry import ModelRegistry
from src.integrations.langgraph import MotorNode, MotorRouter
registry = ModelRegistry()
node = MotorNode(registry=registry)
router = MotorRouter(registry=registry)If you manage message history yourself and don't need the full LangGraph wiring:
from src.core.executor import execute_messages, aexecute, Message
from src.core.analyzer import analyze
from src.core.router import Router
from src.modes.balanced import BalancedMode
from src.registry.registry import ModelRegistry
registry = ModelRegistry()
messages = [
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Summarise this document…"),
]
analysis = analyze(messages[-1].content)
decision = Router(registry).route(analysis, BalancedMode())
# Sync
result = execute_messages(messages, decision.model, health_store=registry.health)
# Async
result = await aexecute(messages, decision.model, health_store=registry.health)
print(result.response)| Setting | Default | Description |
|---|---|---|
default_mode |
"balanced" |
Routing mode when none is specified |
complexity_threshold_low |
0.25 |
Score below this → token_safe tier |
complexity_threshold_high |
0.70 |
Score above this → performance tier |
complexity_threshold_multistep |
0.40 |
Multi-step prompts above this → performance tier |
preferred_provider |
"anthropic" |
Tiebreak when cost is equal |
Defines the model catalog: API IDs, costs per 1k tokens, context limits, capabilities, tier assignments, and tool-reliability scores. Add new models here — no code changes required.
- id: claude-sonnet-4-6
name: Claude Sonnet 4.6
provider: anthropic
cost_per_1k_input: 0.003
cost_per_1k_output: 0.015
max_context: 200000
tier: balanced
capabilities: [function_calling, vision, long_context]
tool_reliability:
structured_output: 0.95
code_execution: 0.92
search: 0.88
multi_step_chains: 0.90motor/
├── main.py # Entry point: CLI, REPL, pipeline orchestration
├── requirements.txt
├── pyproject.toml
│
└── src/
├── api/
│ ├── app.py # FastAPI app and route handlers
│ └── schemas.py # Pydantic request/response models
│
├── config/
│ └── settings.py # Thresholds, API keys, defaults
│
├── core/
│ ├── analyzer.py # Complexity classification
│ ├── router.py # Analysis → routing decision
│ ├── executor.py # Model calls, streaming, tool loops
│ └── evaluator.py # Confidence scoring and flag detection
│
├── modes/
│ ├── base.py # BaseMode interface + shared tool-reliability constants
│ ├── token_safe.py # Always-cheapest routing
│ ├── balanced.py # Adaptive routing by complexity
│ └── performance.py # Always top-tier routing
│
├── integrations/
│ └── langgraph.py # LangGraph node/router factories and tool adapter
│
├── providers/
│ ├── anthropic.py # Anthropic SDK adapter
│ └── openai.py # OpenAI SDK adapter
│
├── registry/
│ ├── registry.py # ModelSpec, ModelRegistry, health-aware queries
│ ├── health.py # Live error rate and latency tracking
│ └── models.yaml # Model catalog
│
└── tests/
├── test_analyzer.py
├── test_router.py
└── fixtures/prompts.json
pytestTests cover the analyzer (complexity scoring, keyword detection, fixture-driven contracts) and the router (tier selection per mode, fixture-driven tier expectations).
- Create
src/providers/yourprovider.pyimplementingBaseProvider.complete(). - Add models to
src/registry/models.yamlwithprovider: yourprovider. - Wire the provider in
src/core/executor.pywhere providers are instantiated.
Edit src/registry/models.yaml. Set the correct tier, costs, tool_reliability scores, and capabilities. The router and health system pick it up automatically.
MIT