🏠 Local-ChatGPT Stack

“Your own private ChatGPT-in-a-box” – powered by Docker, Open-WebUI, Ollama, LiteLLM, SearXNG, vLLM, MinIO, Postgres & Redis.
Zero cloud deps. 100 % local. GPU-ready. Prod-grade.

(https://docs.google.com/presentation/d/1iwP60EFyToYXV5l5sAhJN24P7NNAqaU7J7vljTP6pWo/edit?usp=sharing)[Google Slide: DSBKK#16]

🔍 TL;DR

git clone … && cd openwebui-ollama-docker
cp .env.example .env && nano .env # choose models, keys, passwords
cp searxng/settings.yml.example searxng/settings.yml
docker compose up -d # 3 files auto-merged
Browse → http://localhost:8080 # start chatting
API → http://localhost:8080/api # OpenAI-compatible
LiteLLM proxy → http://localhost:4000 # unified LLM gateway

🧱 Architecture Overview

Component	Role	Default alias	Image
open-webui	Chat UI, RAG, users, workspaces	`ood-open-webui`	`ghcr.io/open-webui/open-webui:main`
ollama	Local LLM inference server	`ood-ollama`	`ollama/ollama:latest`
vllm	High-throughput inference / reranker	`ood-vllm`	`vllm/vllm-openai:latest`
litellm	Universal gateway (OpenAI ⇆ Ollama ⇆ 100+ clouds)	`ood-litellm`	`ghcr.io/berriai/litellm-database:main-latest`
litellm_postgres	LiteLLM metering, keys, spend tracking	`ood-litellm-postgres`	`pgvector:pg17`
litellm_redis	LiteLLM rate-limit & caching	`ood-litellm-redis`	`redis:alpine`
postgres	Open-WebUI vector DB & chat history	`ood-postgres`	`pgvector:pg17`
redis	OWUI real-time websockets / job queue	`ood-redis`	`redis:8-alpine`
searxng	Private meta-search for RAG web-search	`ood-searxng`	`searxng/searxng:latest`
mcpo	Model-Context-Protocol server (tools)	`ood-mcpo`	`ghcr.io/open-webui/mcpo:latest`
minio	S3-compatible object store (file uploads, images)	`ood-minio`	`minio/minio`
minio-createbucket	One-shot bucket creation	`ood-minio-createbucket`	`minio/mc`

All services share one bridge network (default) and communicate via internal aliases – no port clashes, no outward exposure unless you uncomment ports:.

📁 Repository Layout

.
├── docker-compose.llm.yml       # ollama, open-webui, vllm, searxng, mcpo
├── docker-compose.db.yml        # postgres, redis, minio
├── docker-compose.litellm.yml   # litellm, litellm_postgres, litellm_redis
├── .env.example                 # 90+ variables documented inline
├── litellm/config.yaml          # router, fallback, budget, guardrails
├── litellm/.credentials/        # cloud keys (vertex_ai.json, aws.json …)
├── mcpo/config.json.example     # MCP tools (brave-search, filesystem …)
├── searxng/settings.yml         # privacy search engines, output format
└── volumes_minio/
    └── minio-entrypoint.sh      # auto-creates bucket on first run

Compose files are loaded automatically when you run with bash variable environment and run docker compose up -d

## see .env.example
COMPOSE_FILE=docker-compose.db.yml:docker-compose.llm.yml:docker-compose.litellm.yml

Compose files can be specified when you run
docker compose -f docker-compose.db.yml -f docker-compose.llm.yml -f docker-compose.litellm.yml up -d

⚙️ Environment Cheat-Sheet

Key	Purpose	Typical value
`OLLAMA_BASE_URL`	Where OWUI finds Ollama	`http://ollama:11434`
`RAG_RERANKING_MODEL`	Rerank snippets before context	`BAAI/bge-reranker-v2-m3`
`SEARXNG_QUERY_URL`	Web-search endpoint	`http://searxng:8080/search?q=<query>`
`WEBUI_SECRET_KEY`	JWT & cookie secret	`openssl rand -hex 32`
`LITELLM_MASTER_KEY`	Admin key for LiteLLM proxy	`sk-...`
`MINIO_ROOT_USER/PW`	S3 creds (also used by OWUI)	`minioadmin` / `minioadmin`
`USE_CUDA_DOCKER`	Enable GPU flags	`true`
`VOLUME_DOCKER`	Host path for all volumes	`/home/pi/docker_volumes`

Full list inlined in .env.example – every variable is optional; defaults are sane for an 8 GB ARM board.

🚀 Model Life-Cycle

Pull
Inside open-webui → Settings → Models → Ollama → Pull
or docker exec -it ollama ollama pull llama3.2:3b
Register in LiteLLM (optional)
Edit litellm/config.yaml → add model block with model_name: ollama/llama3.2:3b
Restart litellm container → immediately callable via OpenAI SDK pointing to http://localhost:4000

Consume

Chat UI: http://localhost:8080

OpenAI-compatible API:

curl http://localhost:8080/api/chat \
  -H "Authorization: Bearer $WEBUI_SECRET_KEY" \
  -d '{"model":"llama3.2:3b","messages":[{"role":"user","content":"hello"}]}'

LiteLLM gateway:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -d '{"model":"ollama/llama3.2:3b","messages":[{"role":"user","content":"hello"}]}'

🧠 RAG & Agentic Features

Feature	Service(s) involved	Toggle
Hybrid vector + keyword search	Postgres pgvector + built-in FTS	`ENABLE_RAG_HYBRID_SEARCH=true`
Web-search augmentation	SearXNG	`ENABLE_RAG_WEB_SEARCH=true`
Document re-ranking	vLLM (reranker model)	`RAG_RERANKING_MODEL=BAAI/bge-reranker-v2-m3`
File upload (pdf, txt, md …)	MinIO + OWUI	`STORAGE_PROVIDER=s3`
Persistent chat threads	Postgres	already enabled
Function-calling / tools	MCPO	edit `mcpo/config.json`

🔐 Security & Privacy

No ports bound to host by default – reverse-proxy (Traefik, Nginx, Cloudflare-Tunnel) ready.
SearXNG removes trackers, no logs.
LiteLLM supports virtual keys, budget caps, user roles.
All volumes stored locally (/home/pi/docker_volumes/*) – wipe folder = zero residue.
Optional: set ENV=dev to enable openai-compatible from Open-WebUI, GLOBAL_LOG_LEVEL=DEBUG for verbose logs.

🐛 Troubleshooting

Symptom	Fix
`ollama pull` hangs	give container at least 4 GB RAM; on ARM set `OLLAMA_FLASH_ATTENTION=0`
open-webui shows “Connection refused” to ollama	check `OLLAMA_BASE_URL` uses internal alias `http://ollama:11434`
LiteLLM 401	supply `LITELLM_MASTER_KEY` and use same key in client
SearXNG CAPTCHA	choose more engines in `settings.yml` or enable `limiter.toml`
GPU not visible	uncomment `deploy.resources.reservations.devices` in ollama & vllm services; install nvidia-container-toolkit

📈 Hardware Baseline

Tier	RAM	Disk	GPU	Notes
Pocket	8 GB	32 GB	—	3 B model, CPU 4 t/s
Laptop	16 GB	100 GB	8 GB VRAM	8 B model + embeddings
Workstation	32 GB	1 TB	24 GB VRAM	70 B GGUF + vLLM 4-bit

All tiers share the same compose – only model sizes and deploy.limits differ.

🧹 Stop / Reset

docker compose down -v              # wipe everything
docker compose down                 # keep volumes (chat history, models)
docker system prune -a              # reclaim space

📄 License

MIT – see LICENSE file.
Upstream projects keep their respective licenses.

🤝 Contribute

PRs welcome!
Please run pre-commit run --all-files before push (yamlfmt, shellcheck, markdownlint).

Enjoy your completely offline,API-compatible, GPU-accelerated ChatGPT clone!
If this repo saved you a week, leave a ⭐ and share your docker stats screenshots.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏠 Local-ChatGPT Stack

🔍 TL;DR

🧱 Architecture Overview

📁 Repository Layout

⚙️ Environment Cheat-Sheet

🚀 Model Life-Cycle

🧠 RAG & Agentic Features

🔐 Security & Privacy

🐛 Troubleshooting

📈 Hardware Baseline

🧹 Stop / Reset

📄 License

🤝 Contribute

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
litellm		litellm
mcpo		mcpo
searxng		searxng
volumes_minio		volumes_minio
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.db.yml		docker-compose.db.yml
docker-compose.litellm.yml		docker-compose.litellm.yml
docker-compose.llm.yml		docker-compose.llm.yml

Folders and files

Latest commit

History

Repository files navigation

🏠 Local-ChatGPT Stack

🔍 TL;DR

🧱 Architecture Overview

📁 Repository Layout

⚙️ Environment Cheat-Sheet

🚀 Model Life-Cycle

🧠 RAG & Agentic Features

🔐 Security & Privacy

🐛 Troubleshooting

📈 Hardware Baseline

🧹 Stop / Reset

📄 License

🤝 Contribute

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages