Skip to content

Latest commit

 

History

History
194 lines (143 loc) · 5.27 KB

File metadata and controls

194 lines (143 loc) · 5.27 KB

Router Refactor Summary (v2.0.0)

Overview

Successfully simplified the router by using llama.cpp's native Anthropic Messages API instead of custom bidirectional conversion. This eliminates ~500 lines of conversion code and improves performance.

Changes Made

1. Core Router Simplification

File: src/lib/router-server.ts

  • Before: Converted Anthropic → OpenAI → llama.cpp → OpenAI → Anthropic (lines 242-598)
  • After: Direct proxy to llama.cpp's /v1/messages endpoint (lines 242-378)

Key changes:

  • Removed fromMessagesRequest() - No longer converting to OpenAI format
  • Removed toMessagesResponse() - No longer converting from OpenAI format
  • Removed AnthropicStreamConverter - No longer converting SSE events
  • Removed estimateInputTokens() - Not needed for proxying
  • Removed generateMessageId() - llama.cpp handles this
  • Removed Qwen3 XML unescaping workaround - llama.cpp handles this correctly

New implementation:

handleAnthropicMessages() {
  // 1. Parse and validate request
  // 2. Find server by model name
  // 3. Proxy directly to llama.cpp's /v1/messages
  // 4. Stream response pass-through (no conversion)
}

2. Files Removed

Archived (renamed to .backup):

  • src/lib/anthropic-converter.ts (~350 lines) - Bidirectional protocol conversion
  • src/lib/anthropic-stream-converter.ts (~150 lines) - Streaming event conversion

Total lines removed: ~500 lines

3. Documentation Updates

README.md:

  • Updated router description to mention native Anthropic API support
  • Added note about direct pass-through to llama.cpp's implementation
  • Clarified that no conversion overhead exists

CLAUDE.md:

  • Updated architecture section with new request flow
  • Documented key architectural change in v2.0
  • Removed references to conversion code
  • Updated file structure

CHANGELOG.md:

  • Added comprehensive v2.0.0 entry with breaking changes notice
  • Explained benefits and migration guide
  • Listed all architectural changes

package.json:

  • Bumped version from 1.14.1 → 2.0.0 (major version)

4. Test Suite

Created: test-router-refactor.sh

Comprehensive test suite covering:

  • ✅ Non-streaming requests
  • ✅ Streaming requests with SSE events
  • ✅ Tool calling support
  • ✅ Error handling
  • ✅ Health checks

Technical Details

Request Flow Comparison

Before (v1.x):

Client (Anthropic)
  → Router converts to OpenAI
    → llama.cpp /v1/chat/completions
      → Router converts back to Anthropic
        → Client (Anthropic)

After (v2.0):

Client (Anthropic)
  → Router proxies directly
    → llama.cpp /v1/messages
      → Client (Anthropic)

Benefits

  1. Simpler codebase - 500 fewer lines to maintain
  2. Better performance - No conversion overhead
  3. Fewer bugs - Leverage llama.cpp's native implementation
  4. Full feature support - All Anthropic features work natively (tool calling, vision, thinking, etc.)
  5. No workarounds - Qwen3 XML issues handled by llama.cpp
  6. Easier to debug - Direct pass-through means less complexity

Validation

User has extensively tested the native llama.cpp Anthropic API:

  • ✅ Built complete React app using Claude Code
  • ✅ Tool calling works correctly
  • ✅ Better performance than custom router conversion
  • ✅ No Qwen3 XML escaping issues

Migration Guide

For Users

Requirements:

  • llama.cpp with native Anthropic API support (PR #17570+)
  • Update: brew upgrade llama.cpp

API compatibility:

  • No changes to /v1/messages endpoint
  • All existing requests work the same
  • Better compatibility with advanced features

What to expect:

  • Improved performance (no conversion overhead)
  • Better tool calling reliability
  • Native support for all Anthropic features

For Developers

If you're contributing:

  • Removed files are archived as .backup (can be deleted after testing)
  • Router code significantly simplified
  • Focus on proxy logic, not conversion logic
  • Test with ./test-router-refactor.sh

Testing

Manual Testing

# 1. Build the project
npm run build

# 2. Restart router (if running)
npm run dev -- router restart

# 3. Run test suite
./test-router-refactor.sh

# 4. Test with Claude Code
export ANTHROPIC_BASE_URL="http://localhost:9100"
claude --model your-model-name

Expected Results

  • All tests in test-router-refactor.sh should pass
  • Claude Code should work seamlessly with local models
  • Tool calling should work correctly (no escaped strings)
  • Streaming should be smooth with proper SSE events

Rollback Plan

If issues are found:

# Restore old converter files
mv src/lib/anthropic-converter.ts.backup src/lib/anthropic-converter.ts
mv src/lib/anthropic-stream-converter.ts.backup src/lib/anthropic-stream-converter.ts

# Revert router-server.ts
git checkout HEAD~1 src/lib/router-server.ts

# Rebuild
npm run build

Next Steps

  1. ✅ Code refactoring complete
  2. ✅ Documentation updated
  3. ✅ Test suite created
  4. ⏳ Run manual testing
  5. ⏳ Validate with real workloads
  6. ⏳ Delete .backup files after confirmation
  7. ⏳ Release v2.0.0

References

  • llama.cpp PR #17570 - Native Anthropic Messages API support
  • User testing - Confirmed working with Claude Code + full React app
  • Performance improvement - No conversion overhead