Skip to content

stampby/claude-hybrid-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

claude-hybrid-proxy

Local-first LLM proxy for Claude Code. Routes simple tasks through your local model (Lemonade), escalates complex reasoning to Anthropic API. Save 90% on tokens.

"Why spend a dollar when a penny will do?" — the architect

How it works

Claude Code ──→ proxy:8443 ──→ Local LLM (Lemonade/Qwen) ──→ response
                    │
                    └──→ Anthropic API (when local can't handle it)

The proxy intercepts Anthropic API calls and decides where to route them:

  • Local first: Simple prompts, single tool calls, short context → Lemonade
  • API escalation: Multi-step reasoning, long context, tool chains, local model uncertainty → Anthropic
  • Smart retry: If local response quality is low, automatically retry with Anthropic

Requirements

  • Lemonade SDK with a loaded model
  • Claude Code CLI
  • Python 3.12+

Usage

# Start the proxy
claude-hybrid-proxy --local http://localhost:13305 --api-key $ANTHROPIC_API_KEY

# Point Claude Code at it
ANTHROPIC_BASE_URL=http://localhost:8443 claude

Configuration

# ~/.config/claude-hybrid-proxy/config.toml

[local]
url = "http://localhost:13305"
model = "Qwen3.5-35B-A3B-GGUF"
max_tokens = 4096          # local model context budget per turn

[anthropic]
# Falls back to ANTHROPIC_API_KEY env var
model = "claude-sonnet-4-20250514"  # default escalation model

[routing]
# Thresholds for local vs API
max_local_prompt_tokens = 8000     # over this → API
max_local_tools = 3                # more simultaneous tools → API
always_api_patterns = ["plan", "architect", "refactor"]  # keywords that trigger API
escalate_on_uncertainty = true     # retry with API if local seems unsure

Stack

Built for the halo-ai ecosystem on AMD Strix Halo.

  • Lemonade SDK (local LLM backend)
  • FastAPI (proxy server)
  • Anthropic Python SDK (API translation)

License

MIT

Designed and built by the architect.

About

Local-first LLM proxy for Claude Code — routes simple tasks through Lemonade, escalates to Anthropic API. Save 90% on tokens.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages