Skip to content

Comments

providers: increase local openai-compatible timeout for ollama#432

Open
Musawer1214 wants to merge 2 commits intosipeed:mainfrom
Musawer1214:main
Open

providers: increase local openai-compatible timeout for ollama#432
Musawer1214 wants to merge 2 commits intosipeed:mainfrom
Musawer1214:main

Conversation

@Musawer1214
Copy link

@Musawer1214 Musawer1214 commented Feb 18, 2026

📝 Description

This PR fixes runtime instability in local Ollama agent flows by addressing session handling and LLM option usage in the agent loop.

Fixes included

  • Honor explicit CLI session keys (-s) instead of collapsing into a default routed session.
  • Use configured model options from agent config:
    • max_tokens now uses agent.ContextWindow (instead of hardcoded 8192)
    • temperature now uses configured agent temperature (instead of hardcoded 0.7)
  • Add retry handling for transient network errors (EOF, connection reset/broken pipe cases).

Why

Observed behavior showed:

  • Session history unintentionally accumulating in one default session.
  • Config token limits being ignored, increasing latency and instability.
  • Occasional transient transport errors interrupting long local runs.

This PR improves reliability and makes runtime behavior match user configuration.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Closes #

📚 Technical Context (Skip for Docs)

  • Reference URL: N/A
  • Reasoning:
    Runtime path in pkg/agent/loop.go used hardcoded LLM options and did not reliably honor CLI session keys, which caused avoidable instability for local Ollama runs.

🧪 Test Environment

  • Hardware: Ryzen 7 8845H, 32GB RAM
  • OS: Windows
  • Model/Provider: Ollama (qwen3:8b) via http://127.0.0.1:11434/v1
  • Channels: CLI

📸 Evidence (Optional)

Click to view Logs/Screenshots

Validated locally:

  • go test ./pkg/agent
  • go test ./pkg/providers/openai_compat
  • CLI smoke tests with qwen3:8b:
    • Direct prompt success
    • Tool flow (write_file + read_file) success
    • Larger tool write flow success

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated locally with Ollama (qwen3:4b, qwen3:8b) on Windows; this removes fixed 120s timeout failures for local endpoints.

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work — each individual fix is valuable, and the test coverage is significantly better than the competing PR #278. However, I'd recommend splitting this into 3-4 separate PRs:

Why split?

This PR bundles four logically independent changes:

  1. Local endpoint timeout auto-detection — the core timeout fix
  2. Honor CLI session keys — session routing fix, unrelated to timeouts
  3. Use configured max_tokens/temperature — config honoring fix
  4. Transient network error retry — resilience improvement

Bundling them means a problem with any one change blocks all of them, and makes review harder. Each change is independently valuable and reviewable.

Per-change feedback:

Change 1 (auto-detect local timeout): The isLocalAPIBase() approach is clever — zero-config for local models. Note that PR #278 takes a complementary approach (explicit config field). I'd suggest merging #278 first, then adding your auto-detection as the default when no explicit timeout is set.

Change 2 (CLI session keys): Looks correct but needs more edge case tests (e.g., empty SessionKey on CLI).

Change 3 (configured max_tokens/temperature): Good catch on the hardcoded 8192/0.7. This is a legitimate bug fix.

Change 4 (transient retry): The retry logic should use context-aware sleep instead of time.Sleep(backoff) to allow cancellation:

select {
case <-time.After(backoff):
case <-ctx.Done():
    return "", 0, ctx.Err()
}

Suggested path forward:

  1. Split into separate PRs
  2. Coordinate with PR #278 on the timeout approach
  3. Each PR will get faster review and merge cycles

@nikolasdehor
Copy link

Related to #278 (configurable timeout feature request) and #605 (timeout control issue). This PR only bumps the Ollama-specific timeout, while #278 requests a general configurable timeout for all providers. Worth considering whether this should be part of a broader solution per #278.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants