When you ask a question in Chronicle, you see:
An error occurred: HTTPConnectionPool(host='localhost', port=11434):
Read timed out. (read timeout=30)
The first request to Ollama takes 60-90 seconds because:
- Model needs to be loaded into memory
- This is a one-time cost per Ollama restart
- Our initial 30s timeout was too aggressive
Changed timeout from 30s → 90s to handle first request:
timeout=90 # Increased to 90s for first requestAdded /ask/warmup endpoint to load model proactively:
curl -X POST http://localhost:8000/ask/warmupThis takes 60-90s but prepares the model for fast subsequent requests.
Don't restart Ollama unnecessarily:
# Check if model is loaded:
curl http://localhost:11434/api/ps
# Keep Ollama running:
ollama serve- Model loading: 60-90 seconds
- Only happens once after Ollama restart
- With optimizations: 2-5 seconds ✅
- Connection pooling active
- Model already in memory
Just ask your first question and wait ~90 seconds. Subsequent questions will be fast.
When you start Chronicle, run:
curl -X POST http://localhost:8000/ask/warmupWait for it to complete, then all your questions will be fast!
If you use Chronicle regularly, keep Ollama running with the model loaded:
# Load model and keep it
ollama run llama3.2
# Press Ctrl+D to exit but keep model loaded
# Or in background:
ollama serveCheck if model is loaded:
curl http://localhost:11434/api/psIf you see your model listed, it's ready for fast inference!
We could add an automatic warmup on backend startup:
@app.on_event("startup")
async def warmup_ollama():
# Pre-warm the model when backend starts
pass✅ Fixed: Increased timeout to 90s ✅ Added: Warmup endpoint ✅ Expected: First request slow, subsequent fast 🎯 Result: 2-5 second AI responses after warmup!
The optimizations are working - it's just the initial model load that takes time!