Router Refactor Summary (v2.0.0)

Overview

Successfully simplified the router by using llama.cpp's native Anthropic Messages API instead of custom bidirectional conversion. This eliminates ~500 lines of conversion code and improves performance.

Changes Made

1. Core Router Simplification

File: src/lib/router-server.ts

Before: Converted Anthropic → OpenAI → llama.cpp → OpenAI → Anthropic (lines 242-598)
After: Direct proxy to llama.cpp's /v1/messages endpoint (lines 242-378)

Key changes:

Removed fromMessagesRequest() - No longer converting to OpenAI format
Removed toMessagesResponse() - No longer converting from OpenAI format
Removed AnthropicStreamConverter - No longer converting SSE events
Removed estimateInputTokens() - Not needed for proxying
Removed generateMessageId() - llama.cpp handles this
Removed Qwen3 XML unescaping workaround - llama.cpp handles this correctly

New implementation:

handleAnthropicMessages() {
  // 1. Parse and validate request
  // 2. Find server by model name
  // 3. Proxy directly to llama.cpp's /v1/messages
  // 4. Stream response pass-through (no conversion)
}

2. Files Removed

Archived (renamed to .backup):

src/lib/anthropic-converter.ts (~350 lines) - Bidirectional protocol conversion
src/lib/anthropic-stream-converter.ts (~150 lines) - Streaming event conversion

Total lines removed: ~500 lines

3. Documentation Updates

README.md:

Updated router description to mention native Anthropic API support
Added note about direct pass-through to llama.cpp's implementation
Clarified that no conversion overhead exists

CLAUDE.md:

Updated architecture section with new request flow
Documented key architectural change in v2.0
Removed references to conversion code
Updated file structure

CHANGELOG.md:

Added comprehensive v2.0.0 entry with breaking changes notice
Explained benefits and migration guide
Listed all architectural changes

package.json:

Bumped version from 1.14.1 → 2.0.0 (major version)

4. Test Suite

Created: test-router-refactor.sh

Comprehensive test suite covering:

✅ Non-streaming requests
✅ Streaming requests with SSE events
✅ Tool calling support
✅ Error handling
✅ Health checks

Technical Details

Request Flow Comparison

Before (v1.x):

Client (Anthropic)
  → Router converts to OpenAI
    → llama.cpp /v1/chat/completions
      → Router converts back to Anthropic
        → Client (Anthropic)

After (v2.0):

Client (Anthropic)
  → Router proxies directly
    → llama.cpp /v1/messages
      → Client (Anthropic)

Benefits

Simpler codebase - 500 fewer lines to maintain
Better performance - No conversion overhead
Fewer bugs - Leverage llama.cpp's native implementation
Full feature support - All Anthropic features work natively (tool calling, vision, thinking, etc.)
No workarounds - Qwen3 XML issues handled by llama.cpp
Easier to debug - Direct pass-through means less complexity

Validation

User has extensively tested the native llama.cpp Anthropic API:

✅ Built complete React app using Claude Code
✅ Tool calling works correctly
✅ Better performance than custom router conversion
✅ No Qwen3 XML escaping issues

Migration Guide

For Users

Requirements:

llama.cpp with native Anthropic API support (PR #17570+)
Update: brew upgrade llama.cpp

API compatibility:

No changes to /v1/messages endpoint
All existing requests work the same
Better compatibility with advanced features

What to expect:

Improved performance (no conversion overhead)
Better tool calling reliability
Native support for all Anthropic features

For Developers

If you're contributing:

Removed files are archived as .backup (can be deleted after testing)
Router code significantly simplified
Focus on proxy logic, not conversion logic
Test with ./test-router-refactor.sh

Testing

Manual Testing

# 1. Build the project
npm run build

# 2. Restart router (if running)
npm run dev -- router restart

# 3. Run test suite
./test-router-refactor.sh

# 4. Test with Claude Code
export ANTHROPIC_BASE_URL="http://localhost:9100"
claude --model your-model-name

Expected Results

All tests in test-router-refactor.sh should pass
Claude Code should work seamlessly with local models
Tool calling should work correctly (no escaped strings)
Streaming should be smooth with proper SSE events

Rollback Plan

If issues are found:

# Restore old converter files
mv src/lib/anthropic-converter.ts.backup src/lib/anthropic-converter.ts
mv src/lib/anthropic-stream-converter.ts.backup src/lib/anthropic-stream-converter.ts

# Revert router-server.ts
git checkout HEAD~1 src/lib/router-server.ts

# Rebuild
npm run build

Next Steps

✅ Code refactoring complete
✅ Documentation updated
✅ Test suite created
⏳ Run manual testing
⏳ Validate with real workloads
⏳ Delete .backup files after confirmation
⏳ Release v2.0.0

References

llama.cpp PR #17570 - Native Anthropic Messages API support
User testing - Confirmed working with Claude Code + full React app
Performance improvement - No conversion overhead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Router Refactor Summary (v2.0.0)

Overview

Changes Made

1. Core Router Simplification

2. Files Removed

3. Documentation Updates

4. Test Suite

Technical Details

Request Flow Comparison

Benefits

Validation

Migration Guide

For Users

For Developers

Testing

Manual Testing

Expected Results

Rollback Plan

Next Steps

References

FilesExpand file tree

REFACTOR-SUMMARY.md

Latest commit

History

REFACTOR-SUMMARY.md

File metadata and controls

Router Refactor Summary (v2.0.0)

Overview

Changes Made

1. Core Router Simplification

2. Files Removed

3. Documentation Updates

4. Test Suite

Technical Details

Request Flow Comparison

Benefits

Validation

Migration Guide

For Users

For Developers

Testing

Manual Testing

Expected Results

Rollback Plan

Next Steps

References