Skip to content

VladOS95-cyber/motor

Repository files navigation

Motor — Model Selector Engine

Motor is an LLM routing engine that dynamically selects the best language model for each task. Instead of routing every request to the most expensive model, Motor analyzes incoming prompts, classifies their complexity and requirements, and dispatches them to the cheapest model capable of handling the job.

How It Works

Motor runs every prompt through a four-stage pipeline:

Prompt → Analyzer → Router → Executor → Evaluator
  1. Analyzer — Scores prompt complexity (0.0–1.0) using reasoning keywords, tool hints, multi-step markers, and token length.
  2. Router — Maps the analysis to a routing decision: model tier, specific model, and token limits.
  3. Executor — Calls the selected model with streaming and tool-call loop support.
  4. Evaluator — Scores the response confidence and flags issues (truncation, refusals, length mismatches).

Model Tiers

Models are grouped into three tiers based on capability and cost:

Tier Purpose Models
token_safe Simple, factual, short tasks claude-haiku-4-5, gpt-4o-mini
balanced Most everyday tasks claude-sonnet-4-6, gpt-4o, o3-mini
performance Complex reasoning, architecture claude-opus-4-6, o1

Tier selection is driven by complexity score thresholds (configurable in src/config/settings.py):

  • Score < 0.25 → token_safe
  • Score 0.25–0.70 → balanced
  • Score ≥ 0.70 → performance

Routing Modes

Mode Behavior
token_safe Always routes to cheapest capable model; hard cap of 1024 output tokens
balanced Adaptive routing by complexity; default mode
performance Always routes to the highest-tier model; no output limits

Pass --mode at the CLI or set default_mode in settings to switch modes.

Health-Aware Routing

Motor tracks per-model success rates and latency in memory. A model with an error rate ≥ 35% (after at least 3 calls) is automatically demoted — the router skips it and picks the next healthy candidate. Models with high average latency get a cost penalty that pushes the router toward faster alternatives. Recovery is automatic when error rates improve.

Installation

Requirements: Python ≥ 3.11

git clone https://github.com/VladOS95-cyber/motor.git
cd motor

# Using uv (recommended)
uv sync

# With LangGraph integration
uv sync --extra langgraph

# Or using pip
pip install -r requirements.txt
pip install "motor[langgraph]"   # with LangGraph integration

Set your API keys:

export ANTHROPIC_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here

Usage

Single prompt:

python main.py "Explain how transformers work"

With mode and verbose output:

python main.py --mode performance -v "Design a distributed caching system"

Disable streaming:

python main.py --no-stream "Write a regex to match email addresses"

Interactive REPL (blank line to submit, Ctrl-C to exit):

python main.py

CLI Options

positional:
  prompt              Prompt text (omit to enter REPL mode)

options:
  --mode MODE         Routing mode: token_safe | balanced | performance (default: balanced)
  --no-stream         Disable streaming output
  -v, --verbose       Show full routing and evaluation details

API

Motor exposes a FastAPI server for programmatic access. All routing logic and health-aware selection work identically to the CLI.

Start the server:

uvicorn src.api.app:app --reload

Interactive docs are available at http://localhost:8000/docs once the server is running.

Endpoints

Method Path Description
POST /analyze Complexity analysis only — returns signals used by the router, no LLM call
POST /route Analyze + select best model — returns tier, reason, cost info, no LLM call
POST /execute Full pipeline: analyze → route → execute → evaluate
GET /models All models in the registry, sorted by tier then cost
GET /models/tier/{tier} Models filtered to a single tier
GET /health Live model health snapshot: error rates, latency, availability

Request body

POST /analyze, /route:

{
  "prompt": "Your prompt here",
  "mode": "balanced"
}

POST /execute additionally accepts:

{
  "prompt": "Your prompt here",
  "mode": "balanced",
  "system": "Optional system prompt",
  "max_tokens": null
}

mode defaults to "balanced". Valid values: "token_safe", "balanced", "performance".

Example: inspect routing without executing

curl -s -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Debug this Python function and explain the error", "mode": "balanced"}' \
  | python -m json.tool
{
  "model": {
    "id": "claude-sonnet-4-6",
    "tier": "balanced",
    "cost_per_1k_input": 0.003,
    ...
  },
  "tier": "balanced",
  "reason": "complexity=0.43 | mode=balanced | tier=balanced | keywords=['debug']",
  "max_tokens": null,
  "analysis": {
    "complexity_score": 0.43,
    "is_multi_step": false,
    "tool_hints": [],
    "reasoning_keywords": ["debug"]
  }
}

Example: full execution

curl -s -X POST http://localhost:8000/execute \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is 2 + 2?", "mode": "token_safe"}' \
  | python -m json.tool

The /execute response always includes analysis and evaluation blocks alongside the model response, so you can see which model was used, its cost, and confidence score.

The evaluation block reports confidence_score, flags (e.g. truncated_output, possible_refusal), and cost_usd.

Example Output

Prompt: "Design a distributed caching system with eviction policies."

[analyzer]  complexity=0.74  tokens≈12  multi_step=False  tools=—  keywords=['design']
[router]    claude-opus-4-6  tier=performance  provider=anthropic
            complexity=0.74 | mode=balanced | tier=performance | keywords=['design']

... streamed response ...

[stats]     in=512  out=1240  latency=3100ms  finish=stop
[eval]      confidence=1.00  cost=$0.021480

LangGraph Integration

Motor can be used directly inside a LangGraph graph. Install the extra dependencies first:

uv sync --extra langgraph

Motor as a node

The simplest integration — Motor runs its full pipeline (analyze → route → execute) as a single graph node:

from langgraph.graph import StateGraph, MessagesState
from src.integrations.langgraph import MotorNode

graph = StateGraph(MessagesState)
graph.add_node("motor", MotorNode())
graph.set_entry_point("motor")
graph.set_finish_point("motor")
app = graph.compile()

result = await app.ainvoke({"messages": [{"role": "user", "content": "Explain RLHF"}]})

MotorNode is an async callable class. It accepts LangChain message objects (HumanMessage, AIMessage, etc.) directly from state.

With LangChain tools

from langchain_community.tools import DuckDuckGoSearchRun
from src.integrations.langgraph import MotorNode

graph.add_node("motor", MotorNode(tools=[DuckDuckGoSearchRun()]))

LangChain BaseTool objects are automatically adapted to Motor's tool-executor interface.

Motor as a router (conditional edges)

Use MotorRouter to decide which specialised node runs next without executing any LLM call:

from src.integrations.langgraph import MotorRouter

graph.add_conditional_edges("entry", MotorRouter(), {
    "token_safe":  "cheap_node",
    "balanced":    "standard_node",
    "performance": "reasoning_node",
})

MotorRouter is a synchronous callable class that analyses the last user message and returns the tier name ("token_safe" / "balanced" / "performance").

Using a shared registry

For production use, create a single ModelRegistry and pass it to both so health tracking is shared across calls:

from src.registry.registry import ModelRegistry
from src.integrations.langgraph import MotorNode, MotorRouter

registry = ModelRegistry()
node   = MotorNode(registry=registry)
router = MotorRouter(registry=registry)

Calling execute_messages / aexecute directly

If you manage message history yourself and don't need the full LangGraph wiring:

from src.core.executor import execute_messages, aexecute, Message
from src.core.analyzer import analyze
from src.core.router import Router
from src.modes.balanced import BalancedMode
from src.registry.registry import ModelRegistry

registry = ModelRegistry()
messages = [
    Message(role="system", content="You are a helpful assistant."),
    Message(role="user",   content="Summarise this document…"),
]

analysis = analyze(messages[-1].content)
decision = Router(registry).route(analysis, BalancedMode())

# Sync
result = execute_messages(messages, decision.model, health_store=registry.health)

# Async
result = await aexecute(messages, decision.model, health_store=registry.health)
print(result.response)

Configuration

src/config/settings.py

Setting Default Description
default_mode "balanced" Routing mode when none is specified
complexity_threshold_low 0.25 Score below this → token_safe tier
complexity_threshold_high 0.70 Score above this → performance tier
complexity_threshold_multistep 0.40 Multi-step prompts above this → performance tier
preferred_provider "anthropic" Tiebreak when cost is equal

src/registry/models.yaml

Defines the model catalog: API IDs, costs per 1k tokens, context limits, capabilities, tier assignments, and tool-reliability scores. Add new models here — no code changes required.

- id: claude-sonnet-4-6
  name: Claude Sonnet 4.6
  provider: anthropic
  cost_per_1k_input: 0.003
  cost_per_1k_output: 0.015
  max_context: 200000
  tier: balanced
  capabilities: [function_calling, vision, long_context]
  tool_reliability:
    structured_output: 0.95
    code_execution: 0.92
    search: 0.88
    multi_step_chains: 0.90

Project Structure

motor/
├── main.py                    # Entry point: CLI, REPL, pipeline orchestration
├── requirements.txt
├── pyproject.toml
│
└── src/
    ├── api/
    │   ├── app.py             # FastAPI app and route handlers
    │   └── schemas.py         # Pydantic request/response models
    │
    ├── config/
    │   └── settings.py        # Thresholds, API keys, defaults
    │
    ├── core/
    │   ├── analyzer.py        # Complexity classification
    │   ├── router.py          # Analysis → routing decision
    │   ├── executor.py        # Model calls, streaming, tool loops
    │   └── evaluator.py       # Confidence scoring and flag detection
    │
    ├── modes/
    │   ├── base.py            # BaseMode interface + shared tool-reliability constants
    │   ├── token_safe.py      # Always-cheapest routing
    │   ├── balanced.py        # Adaptive routing by complexity
    │   └── performance.py     # Always top-tier routing
    │
    ├── integrations/
    │   └── langgraph.py       # LangGraph node/router factories and tool adapter
    │
    ├── providers/
    │   ├── anthropic.py       # Anthropic SDK adapter
    │   └── openai.py          # OpenAI SDK adapter
    │
    ├── registry/
    │   ├── registry.py        # ModelSpec, ModelRegistry, health-aware queries
    │   ├── health.py          # Live error rate and latency tracking
    │   └── models.yaml        # Model catalog
    │
    └── tests/
        ├── test_analyzer.py
        ├── test_router.py
        └── fixtures/prompts.json

Running Tests

pytest

Tests cover the analyzer (complexity scoring, keyword detection, fixture-driven contracts) and the router (tier selection per mode, fixture-driven tier expectations).

Adding a Provider

  1. Create src/providers/yourprovider.py implementing BaseProvider.complete().
  2. Add models to src/registry/models.yaml with provider: yourprovider.
  3. Wire the provider in src/core/executor.py where providers are instantiated.

Adding a Model

Edit src/registry/models.yaml. Set the correct tier, costs, tool_reliability scores, and capabilities. The router and health system pick it up automatically.

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors

Languages