Skip to content

Add /internal/chat endpoint for multi-model editor integration#25

Open
JMRussas wants to merge 8 commits intomainfrom
feature/internal-chat-endpoint
Open

Add /internal/chat endpoint for multi-model editor integration#25
JMRussas wants to merge 8 commits intomainfrom
feature/internal-chat-endpoint

Conversation

@JMRussas
Copy link
Owner

@JMRussas JMRussas commented Mar 7, 2026

Summary

  • Adds standalone /api/internal/chat endpoint for editor integration (NoZ, VS Code extension)
  • Routes to CLI providers (Claude, Gemini, Codex) via async subprocess, Ollama via HTTP API
  • Supports conversation history (last 20 messages), model selection per provider, and native Ollama /api/chat messages array

Test plan

  • curl -X POST http://localhost:5200/api/internal/chat -H 'Content-Type: application/json' -d '{"prompt": "hello", "provider": "gemini"}'
  • Test with conversation history: send messages array
  • Test Ollama routing: provider: "ollama" uses HTTP API directly
  • Test model selection: Gemini -m, Codex --model flags
  • Verify no auth required (no Bearer token needed)

Generated by Claude Code · Claude Opus 4.6

JMRussas and others added 8 commits March 7, 2026 13:07
Standalone chat proxy that routes to CLI providers (Claude, Gemini, Codex)
via subprocess and Ollama via HTTP API. Supports conversation history
(last 20 messages), model selection, and native Ollama /api/chat messages.

No auth — intended for trusted network access (editor integrations).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shutil.which() resolves bare command names to full .cmd paths on Windows.
Avoids shell=True (command injection risk) while finding npm global CLIs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New llm_router.py: call_llm() routes through CLI subprocess (Claude,
Gemini, Codex) or Ollama HTTP with automatic fallback chain. Zero cost
on subscription billing.

Planner no longer requires ANTHROPIC_API_KEY. Budget reservation removed
(cost is always $0 on subscription). Provider fallback: gemini → claude → codex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bypasses auth to allow MCP and internal callers to trigger planning
via CLI providers. Includes traceback in error response for debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Windows has a ~32K command line length limit. Large planning prompts
(system prompt + requirements) exceed this when passed as -p arguments.
Now pipes prompts via stdin for all CLI providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Planner returns plan_id, not id. Use .get() with fallback for resilience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same Windows command line length fix as llm_router.py — pipe prompts
via stdin instead of passing as -p arguments to CLI providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant