Skip to content

Add interactive chat harness for LLM service testing with auto-detection#90

Merged
SorraTheOrc merged 13 commits intomainfrom
copilot/build-test-chat-interface
Dec 14, 2025
Merged

Add interactive chat harness for LLM service testing with auto-detection#90
SorraTheOrc merged 13 commits intomainfrom
copilot/build-test-chat-interface

Conversation

Copy link

Copilot AI commented Dec 13, 2025

Engineers need a lightweight way to test /parse_intent and /narrate endpoints, validate prompt changes, observe token usage, and debug latency without writing test code or navigating FastAPI docs.

Implementation

HTTP Client (src/gengine/echoes/llm/chat_client.py)

  • Async wrapper around httpx.AsyncClient with context manager support
  • Methods for /parse_intent, /narrate, and /healthz endpoints
  • Configurable base URL, timeout, and custom headers

CLI REPL (scripts/echoes_llm_chat.py)

  • Interactive chat loop with two modes:
    • parse: Natural language → intent JSON
    • narrate: Events JSON → narrative text
  • Auto-detection of service URL: Automatically discovers the LLM service by trying:
    1. Windows host IP (when running in WSL via /etc/resolv.conf)
    2. http://localhost:8001
    3. Displays helpful error if neither works
  • Multi-turn history management with configurable limits (default: 10 exchanges)
  • Slash commands: /clear, /save <path>, /quit, /exit
  • Context file loading via --context-file for initial game state
  • Displays latency, confidence, and token metrics per response

Usage

# Auto-detect service (no arguments needed)
uv run python scripts/echoes_llm_chat.py

# Or specify service URL explicitly
uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001

You: inspect the industrial district
📋 Intents:
[{"type": "inspect", "target": "district"}]
⏱  Latency: 45ms
🎯 Confidence: 0.95

# Narrate mode
uv run python scripts/echoes_llm_chat.py --mode narrate

Events (JSON or text): [{"type": "pollution_increase", "district": "industrial"}]
📖 Narrative:
The industrial district's pollution levels rose sharply...
⏱  Latency: 120ms
📊 Tokens: 45 in / 32 out

Testing (tests/echoes/test_llm_chat_cli.py)

  • 13 tests covering request formation, history management, transcript export, and auto-detection
  • Uses httpx.MockTransport for HTTP layer mocking
  • Script imports via importlib.util.spec_from_file_location

Documentation

Added "LLM Chat Harness" section to README with:

  • Prerequisites and command-line options
  • Auto-detection behavior and WSL support
  • Sample sessions for both modes
  • Troubleshooting guide (connection failures, auto-detection, TLS errors, authentication)
  • Provider configuration examples (stub, OpenAI, Anthropic, Foundry)

Works with stub provider out of the box (no API keys required). Service URL auto-detection makes it easy to run without configuration—especially useful when working with WSL and Windows host services.

[!WARNING]

Original prompt

This section details on the original issue you should resolve

<filter_complete></filter_complete>

<issue_title>13.1.1 Build test chat interface</issue_title>
<issue_description>## Context

  • Phase 13 kicks off conversational tooling for the Echoes LLM service, but we currently do not have a developer-facing harness to exercise /parse_intent and /narrate outside of automated tests.
  • Engineers need a lightweight way to chat with the running echoes_llm_service (stub, OpenAI, Anthropic, or Foundry providers) to validate prompt changes, observe token usage, and debug latency before wiring any gameplay endpoints.
  • Providing a simple CLI chat loop will also let PMs and designers run scripted demos against remote environments without digging into FastAPI clients.

Goals

  • Provide a repeatable command (e.g., uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001) that opens an interactive prompt, accepts user text, and relays it to the configured echoes_llm_service.
  • Maintain basic multi-turn history on the client side so each request can optionally send the prior exchanges as context payload.
  • Surface useful debugging metadata (status, latency, provider/model, token counts) after each response and allow exporting transcripts.
  • Ship minimal documentation so teammates can run the tool locally or point it at a remote base URL.

Implementation Guidance

  1. Add a reusable HTTP client helper (e.g., src/gengine/echoes/llm/chat_client.py) that wraps httpx.AsyncClient and knows how to hit /parse_intent (default) and /narrate when a --mode narrate flag is set. Accept base URL, timeout, and optional API key headers.
  2. Create a CLI entry point under scripts/ (for example scripts/echoes_llm_chat.py) that:
    • uses argparse to capture --service-url, --context-file (JSON), --mode (parse|narrate), --history-limit, and --export transcript.json.
    • supports slash commands like /clear, /save <path>, and /quit for convenience.
    • keeps an in-memory List[Dict[str, str]] history that is serialized into the context payload for /parse_intent (e.g., { "history": [...], "metadata": {...} }).
    • prints structured output: intents (pretty JSON) for parse mode, generated narrative for narrate mode, plus latency/token metrics extracted from response metadata if available.
  3. Add unit tests in tests/echoes (e.g., test_llm_chat_cli.py) that mock the HTTP layer (httpx.MockTransport or respx) to verify:
    • requests are formed with history/context and mode-specific payloads
    • /clear resets the local buffer and /save writes JSON transcripts
    • error responses surface readable messages without crashing the REPL.
  4. Extend README "LLM Service" coverage (or add a short "LLM Chat Harness" subsection) documenting prerequisites, commands, and sample session transcripts. Include guidance for pointing at stub vs. OpenAI/Anthropic providers and how to supply API keys via ECHOES_LLM_* env vars.
  5. Provide a short troubleshooting section covering TLS errors, authentication failures, and how to run against docker compose (http://localhost:8001).

Acceptance Criteria

  • Running uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001 opens an interactive prompt that can exchange messages with the stub provider out of the box.
  • Users can switch between parse (intent JSON output) and narrate (story text) modes via CLI flag without restarting the service.
  • Conversation history is included in subsequent requests and can be cleared/exported via commands.
  • Errors from the service are handled gracefully with descriptive output and non-zero exit codes where appropriate.
  • Documentation (README or linked doc) explains setup, command options, and sample usage for local + remote endpoints.
  • Automated tests cover request formation, history management, and error handling.

Risks & Mitigations

  • Provider authentication differences: Document environment variables and default to stub provider, so running without API keys still works.
  • Long-running chats may reveal latency: Add per-request timing + token metrics to highlight slowness and provide guidance to switch providers.
  • Transcript storage: Limit history size (--history-limit) and redact API keys when exporting transcripts.

Tracker Reference

See .pm/tracker.md > Phase 13 > Task 13.1.1.
</issue_description>

Comments on the Issue (you are @copilot in this section)

Original prompt

This section details on the original issue you should resolve

<issue_title>13.1.1 Build test chat interface</issue_title>
<issue_description>## Context

  • Phase 13 kicks off conversational tooling for the Echoes LLM service, but we currently do not have a developer-facing harness to exercise /parse_intent and /narrate outside of automated tests.
  • Engineers need a lightweight way to chat with the running echoes_llm_service (stub, OpenAI, Anthropic, or Foundry providers) to validate prompt changes, observe token usage, and debug latency before wiring any gameplay endpoints.
  • Providing a simple CLI chat loop will also let PMs and designers run scripted demos against remote environments without digging into FastAPI clients.

Goals

  • Provide a repeatable command (e.g., uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001) that opens an interactive prompt, accepts user text, and relays it to the configured echoes_llm_service.
  • Maintain basic multi-turn history on the client side so each request can optionally send the prior exchanges as context payload.
  • Surface useful debugging metadata (status, latency, provider/model, token counts) after each response and allow exporting transcripts.
  • Ship minimal documentation so teammates can run the tool locally or point it at a remote base URL.

Implementation Guidance

  1. Add a reusable HTTP client helper (e.g., src/gengine/echoes/llm/chat_client.py) that wraps httpx.AsyncClient and knows how to hit /parse_intent (default) and /narrate when a --mode narrate flag is set. Accept base URL, timeout, and optional API key headers.
  2. Create a CLI entry point under scripts/ (for example scripts/echoes_llm_chat.py) that:
    • uses argparse to capture --service-url, --context-file (JSON), --mode (parse|narrate), --history-limit, and --export transcript.json.
    • supports slash commands like /clear, /save <path>, and /quit for convenience.
    • keeps an in-memory List[Dict[str, str]] history that is serialized into the context payload for /parse_intent (e.g., { "history": [...], "metadata": {...} }).
    • prints structured output: intents (pretty JSON) for parse mode, generated narrative for narrate mode, plus latency/token metrics extracted from response metadata if available.
  3. Add unit tests in tests/echoes (e.g., test_llm_chat_cli.py) that mock the HTTP layer (httpx.MockTransport or respx) to verify:
    • requests are formed with history/context and mode-specific payloads
    • /clear resets the local buffer and /save writes JSON transcripts
    • error responses surface readable messages without crashing the REPL.
  4. Extend README "LLM Service" coverage (or add a short "LLM Chat Harness" subsection) documenting prerequisites, commands, and sample session transcripts. Include guidance for pointing at stub vs. OpenAI/Anthropic providers and how to supply API keys via ECHOES_LLM_* env vars.
  5. Provide a short troubleshooting section covering TLS errors, authentication failures, and how to run against docker compose (http://localhost:8001).

Acceptance Criteria

  • Running uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001 opens an interactive prompt that can exchange messages with the stub provider out of the box.
  • Users can switch between parse (intent JSON output) and narrate (story text) modes via CLI flag without restarting the service.
  • Conversation history is included in subsequent requests and can be cleared/exported via commands.
  • Errors from the service are handled gracefully with descriptive output and non-zero exit codes where appropriate.
  • Documentation (README or linked doc) explains setup, command options, and sample usage for local + remote endpoints.
  • Automated tests cover request formation, history management, and error handling.

Risks & Mitigations

  • Provider authentication differences: Document environment variables and default to stub provider, so running without API keys still works.
  • Long-running chats may reveal latency: Add per-request timing + token metrics to highlight slowness and provide guidance to switch providers.
  • Transcript storage: Limit history size (--history-limit) and redact API keys when exporting transcripts.

Tracker Reference

See .pm/tracker.md > Phase 13 > Task 13.1.1.
</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits December 13, 2025 21:43
Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Copilot AI changed the title [WIP] Add test chat interface for echoes LLM service Add interactive chat harness for LLM service testing Dec 13, 2025
Copilot AI requested a review from SorraTheOrc December 13, 2025 21:52
@SorraTheOrc
Copy link
Member

@copilot rather than require the user to provide the service URL can we please attempt to auto detect it. Start by looking for it on the windows host, assuming the client is running in WSL. This would require us to find the IP number and use that instead of localhost. If that fails then try localhost and if that fails return an error.

Also allow /exit as an alias for /quit

Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 13, 2025

@copilot rather than require the user to provide the service URL can we please attempt to auto detect it. Start by looking for it on the windows host, assuming the client is running in WSL. This would...

Done in commit 5ca8ec1.

Auto-detection: The client now automatically detects the service URL when --service-url is not specified. It tries:

  1. Windows host IP (when running in WSL, parsed from /etc/resolv.conf)
  2. http://localhost:8001

If both fail, it displays a helpful error message.

Example with auto-detection:

Auto-detecting LLM service...
✓ Detected service at http://localhost:8001

/exit alias: Added /exit as an alias for /quit. Both commands now work identically to exit the chat interface.

The slash commands help text has been updated to show: /clear, /save <path>, /quit, /exit

Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Copilot AI changed the title Add interactive chat harness for LLM service testing Add interactive chat harness for LLM service testing with auto-detection Dec 13, 2025
@SorraTheOrc SorraTheOrc merged commit 0177b4b into main Dec 14, 2025
@SorraTheOrc SorraTheOrc deleted the copilot/build-test-chat-interface branch December 14, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

13.1.1 Build test chat interface

2 participants