GitHub - KazKozDev/llmflow-search: LLMFlow Search agent processes complex queries, deep searches, and synthesizes information from the web.

Turns your question into a search plan, runs it across 9 sources, from DuckDuckGo to ArXiv,
and assembles a markdown report with references. Fully local. Fully yours.

Highlights

9 integrated search sources in one pipeline
Ollama-first setup with no API key required
CLI, FastAPI, and WebSocket interfaces
SQLite cache, rate limits, and background jobs
Markdown reports with references and live progress

Demo

Why

ChatGPT, Claude, and Gemini all offer deep research workflows, but they are tightly coupled to their own APIs, search stacks, and pricing models. If you want to swap the model, you end up reworking the pipeline. If you want to add a source like PubMed, you are back to implementation details instead of actual research.

LLMFlow Search keeps the same overall workflow, but the LLM provider and search sources are configured rather than hardcoded. You can run Ollama locally, switch to OpenAI in the cloud, or plug in your own SearXNG instance through the same entry point.

What's Inside

Pipeline: builds a search plan, revises it on the fly, explores alternate query paths, runs parallel search across sources, parses results, and generates a report.
9 sources: DuckDuckGo, Wikipedia, SearXNG, ArXiv, PubMed, YouTube, Gutenberg, OpenStreetMap, and Wayback.
Infrastructure: SQLite cache, per-source rate limiting, background job queue, WebSocket live progress, and metrics endpoint for system and LLM telemetry.

Architecture

Components:

main.py drives the interactive CLI workflow.
web_server.py exposes the FastAPI API, WebSocket stream, and static UI.
core/agent_factory.py initializes shared cache and LLM resources.
core/planning_module.py builds and revises the search plan.
core/tools_module.py dispatches search tools with caching and rate limits.
core/memory_module.py stores gathered results for later synthesis.
core/report_generator.py produces the final markdown report.

Flow: Query -> planning -> tool execution -> parsing and memory -> report generation -> CLI output or streamed web result

graph TD
    A[User Query] --> B[Planning Module]
    B --> C[Tools Module]
    C --> D[Search Providers]
    C --> E[Cache and Rate Limiter]
    D --> F[Memory Module]
    F --> G[Report Generator]
    G --> H[CLI Report or Web UI Result]

Tech Stack

Python 3.11 runtime in Docker, Python 3.9+ expected locally
FastAPI and Uvicorn for the web server
aiohttp and httpx for async network access
Pydantic for config validation
Ollama, OpenAI, Anthropic, and Gemini provider hooks
SQLite caching through aiosqlite
Selenium and Chromium for browser-assisted retrieval
pytest for tests

Quick Start

Clone the repository and install dependencies.

git clone https://github.com/KazKozDev/llmflow-search.git
cd llmflow-search
pip install -r requirements.txt

Review config.json and keep the default ollama provider, or switch to another provider and set the matching API key.
If you use Ollama, start an Ollama server locally or expose it through OLLAMA_HOST.
Run the web app or the CLI.

python web_server.py

python main.py --output reports/report.md --max-iterations 10

Detailed setup notes are in docs/setup.md.

Configuration

The runtime is configured through config.json, with provider secrets and host overrides coming from environment variables.

Minimal local setup with Ollama:

{
  "llm": {
    "provider": "ollama",
    "model": "qwen3:8b",
    "temperature": 0.2,
    "max_tokens": 4096
  },
  "search": {
    "max_results": 5,
    "parse_top_results": 3,
    "use_selenium": true,
    "use_cache": true
  }
}

Environment variables:

Variable	Required when	Purpose
`OLLAMA_HOST`	Using Ollama on a non-default host	Points the app to your Ollama server
`OPENAI_API_KEY`	`provider: openai`	Enables OpenAI-backed runs
`ANTHROPIC_API_KEY`	`provider: anthropic`	Enables Anthropic-backed runs
`GEMINI_API_KEY` or `GOOGLE_API_KEY`	`provider: gemini`	Enables Gemini-backed runs

Provider switching is done by changing llm.provider and llm.model in config.json; the rest of the pipeline stays the same.

Usage

Interactive CLI run:

python main.py --output reports/report.md --verbose --max-iterations 12

Example prompt:

Compare small language models suitable for offline document search on a Mac.

Run the web interface locally:

python web_server.py

Start the containerized environment:

docker compose up --build

Trigger a standard web/API session:

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query":"Find recent papers on local RAG evaluation","max_iterations":10,"mode":"standard"}'

The API returns a session_id, and progress then streams over WS /ws/search/{session_id}.

API Overview

The web server in web_server.py exposes a small HTTP and WebSocket surface for the UI.

GET / serves the static interface from web/static/index.html
POST /api/search starts a standard session or queues a deep-search job
GET /api/sessions lists in-memory standard sessions
GET /api/tools returns the available search tools
GET /api/metrics returns system and LLM metrics
GET /api/jobs lists background jobs
GET /api/jobs/{job_id} returns one background job
POST /api/jobs/{job_id}/cancel cancels a running background job
WS /ws/search/{session_id} streams status, progress, result, complete, and error messages for a standard session

Project Structure

core/
  caching/         # Cache backends and factory
  tools/           # Search tool implementations and parsers
  agent_factory.py # Shared resource lifecycle
  agent_core.py    # Agent execution loop
  report_generator.py
tests/
  test_agent_react_loop.py
  test_background_jobs.py
  test_tool_usage.py
  test_web_server.py
web/
  static/          # HTML, JS, CSS, and logo
main.py            # Interactive CLI entry point
web_server.py      # FastAPI + WebSocket server
config.json        # Runtime configuration
docker-compose.yml # Container orchestration

Status

Stage: Experimental

Current state:

Local-first Ollama workflow works out of the box.
CLI, web UI, WebSocket streaming, and background deep-search jobs are available.
Interfaces, configuration details, and tool coverage may still evolve as the project hardens.

Testing

python -m pytest tests -q

Contributing

See CONTRIBUTING.md.

MIT - see LICENSE

If you like this project, please give it a star ⭐

For questions, feedback, or support, reach out to:

LinkedIn Email

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.agent/workflows		.agent/workflows
.github		.github
core		core
data		data
docs		docs
reports		reports
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LLMFlowSearch.command		LLMFlowSearch.command
README.md		README.md
SECURITY.md		SECURITY.md
config.json		config.json
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
test_output.txt		test_output.txt
web_server.py		web_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highlights

Demo

Why

What's Inside

Architecture

Tech Stack

Quick Start

Configuration

Usage

API Overview

Project Structure

Status

Testing

Contributing

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Highlights

Demo

Why

What's Inside

Architecture

Tech Stack

Quick Start

Configuration

Usage

API Overview

Project Structure

Status

Testing

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages