Turns your question into a search plan, runs it across 9 sources, from DuckDuckGo to ArXiv,
and assembles a markdown report with references. Fully local. Fully yours.
- 9 integrated search sources in one pipeline
- Ollama-first setup with no API key required
- CLI, FastAPI, and WebSocket interfaces
- SQLite cache, rate limits, and background jobs
- Markdown reports with references and live progress
ChatGPT, Claude, and Gemini all offer deep research workflows, but they are tightly coupled to their own APIs, search stacks, and pricing models. If you want to swap the model, you end up reworking the pipeline. If you want to add a source like PubMed, you are back to implementation details instead of actual research.
LLMFlow Search keeps the same overall workflow, but the LLM provider and search sources are configured rather than hardcoded. You can run Ollama locally, switch to OpenAI in the cloud, or plug in your own SearXNG instance through the same entry point.
- Pipeline: builds a search plan, revises it on the fly, explores alternate query paths, runs parallel search across sources, parses results, and generates a report.
- 9 sources: DuckDuckGo, Wikipedia, SearXNG, ArXiv, PubMed, YouTube, Gutenberg, OpenStreetMap, and Wayback.
- Infrastructure: SQLite cache, per-source rate limiting, background job queue, WebSocket live progress, and metrics endpoint for system and LLM telemetry.
Components:
main.pydrives the interactive CLI workflow.web_server.pyexposes the FastAPI API, WebSocket stream, and static UI.core/agent_factory.pyinitializes shared cache and LLM resources.core/planning_module.pybuilds and revises the search plan.core/tools_module.pydispatches search tools with caching and rate limits.core/memory_module.pystores gathered results for later synthesis.core/report_generator.pyproduces the final markdown report.
Flow: Query -> planning -> tool execution -> parsing and memory -> report generation -> CLI output or streamed web result
graph TD
A[User Query] --> B[Planning Module]
B --> C[Tools Module]
C --> D[Search Providers]
C --> E[Cache and Rate Limiter]
D --> F[Memory Module]
F --> G[Report Generator]
G --> H[CLI Report or Web UI Result]
- Python 3.11 runtime in Docker, Python 3.9+ expected locally
- FastAPI and Uvicorn for the web server
- aiohttp and httpx for async network access
- Pydantic for config validation
- Ollama, OpenAI, Anthropic, and Gemini provider hooks
- SQLite caching through aiosqlite
- Selenium and Chromium for browser-assisted retrieval
- pytest for tests
- Clone the repository and install dependencies.
git clone https://github.com/KazKozDev/llmflow-search.git
cd llmflow-search
pip install -r requirements.txt-
Review
config.jsonand keep the defaultollamaprovider, or switch to another provider and set the matching API key. -
If you use Ollama, start an Ollama server locally or expose it through
OLLAMA_HOST. -
Run the web app or the CLI.
python web_server.pypython main.py --output reports/report.md --max-iterations 10Detailed setup notes are in docs/setup.md.
The runtime is configured through config.json, with provider secrets and host overrides coming from environment variables.
Minimal local setup with Ollama:
{
"llm": {
"provider": "ollama",
"model": "qwen3:8b",
"temperature": 0.2,
"max_tokens": 4096
},
"search": {
"max_results": 5,
"parse_top_results": 3,
"use_selenium": true,
"use_cache": true
}
}Environment variables:
| Variable | Required when | Purpose |
|---|---|---|
OLLAMA_HOST |
Using Ollama on a non-default host | Points the app to your Ollama server |
OPENAI_API_KEY |
provider: openai |
Enables OpenAI-backed runs |
ANTHROPIC_API_KEY |
provider: anthropic |
Enables Anthropic-backed runs |
GEMINI_API_KEY or GOOGLE_API_KEY |
provider: gemini |
Enables Gemini-backed runs |
Provider switching is done by changing llm.provider and llm.model in config.json; the rest of the pipeline stays the same.
Interactive CLI run:
python main.py --output reports/report.md --verbose --max-iterations 12Example prompt:
Compare small language models suitable for offline document search on a Mac.
Run the web interface locally:
python web_server.pyStart the containerized environment:
docker compose up --buildTrigger a standard web/API session:
curl -X POST http://127.0.0.1:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query":"Find recent papers on local RAG evaluation","max_iterations":10,"mode":"standard"}'The API returns a session_id, and progress then streams over WS /ws/search/{session_id}.
The web server in web_server.py exposes a small HTTP and WebSocket surface for the UI.
GET /serves the static interface fromweb/static/index.htmlPOST /api/searchstarts a standard session or queues a deep-search jobGET /api/sessionslists in-memory standard sessionsGET /api/toolsreturns the available search toolsGET /api/metricsreturns system and LLM metricsGET /api/jobslists background jobsGET /api/jobs/{job_id}returns one background jobPOST /api/jobs/{job_id}/cancelcancels a running background jobWS /ws/search/{session_id}streamsstatus,progress,result,complete, anderrormessages for a standard session
core/
caching/ # Cache backends and factory
tools/ # Search tool implementations and parsers
agent_factory.py # Shared resource lifecycle
agent_core.py # Agent execution loop
report_generator.py
tests/
test_agent_react_loop.py
test_background_jobs.py
test_tool_usage.py
test_web_server.py
web/
static/ # HTML, JS, CSS, and logo
main.py # Interactive CLI entry point
web_server.py # FastAPI + WebSocket server
config.json # Runtime configuration
docker-compose.yml # Container orchestration
Stage: Experimental
Current state:
- Local-first Ollama workflow works out of the box.
- CLI, web UI, WebSocket streaming, and background deep-search jobs are available.
- Interfaces, configuration details, and tool coverage may still evolve as the project hardens.
python -m pytest tests -qSee CONTRIBUTING.md.
MIT - see LICENSE
If you like this project, please give it a star ⭐
For questions, feedback, or support, reach out to:

