Releases: mara-werils/llmstack
v1.0.0 — The Open-Source LLM Platform
llmstack v1.0.0
The open-source LLM platform that actually saves you money. Four major features ship in this release:
Universal LLM Gateway
- Route requests across 6 cloud providers (OpenAI, Anthropic, Google, Groq, Together, Mistral) + local Ollama/vLLM through a single OpenAI-compatible endpoint
- Cost-aware routing — automatically picks the cheapest model for each query tier
- Fallback chains — automatic failover between providers on errors
- Per-provider cost tracking with `X-Cost-USD` response headers
- Format translation: Anthropic Messages API, Google Gemini → OpenAI format
AI Agents & MCP Server
- ReAct agent loop with 6 built-in tools: `read_file`, `write_file`, `list_directory`, `grep`, `shell`, `http_get`
- `llmstack agent "task"` — complete tasks autonomously with tool use
- MCP server (`llmstack mcp`) — expose tools and LLM inference to Claude Code, Cursor, VS Code
- 8 MCP tools including `llmstack_chat` and `llmstack_ask` (file RAG)
One-Command Fine-tuning
- `llmstack finetune data.jsonl --export-ollama my-model`
- Auto-detect format (CSV, JSON, JSONL, TXT, Parquet) and columns
- Auto hyperparameters based on dataset size and model
- Dual backend: unsloth (2x faster) or HuggingFace PEFT/TRL
- GGUF export + Ollama model creation
AI-Native Observability
- Quality scoring on every response (coherence, relevance, refusal, toxicity, repetition)
- Drift detection — automatic alerts when quality degrades
- A/B testing — compare models with statistical confidence
- Request tracing — full lifecycle traces with quality scores
- `llmstack eval` CLI for dataset evaluation and live gateway monitoring
Stats
- +7,860 lines of new code
- 448 tests, 0 failures
- 14 CLI commands
- 12 API endpoints
Install
```bash
pip install llmstack-cli
```
Full changelog
New commands: `agent`, `mcp`, `finetune`, `eval`
New modules: `gateway/providers/` (7 adapters), `agent/` (tools + loop), `mcp/` (JSON-RPC server), `finetune/` (data + training + eval + export), `observe/` (scoring + traces + tracker + A/B)
Config: Added `providers`, `agents`, `mcp`, `finetune` sections to `llmstack.yaml`. Extended `observe` with quality tracking settings.
Dependencies: Added optional `[finetune]` extra for PyTorch/PEFT/TRL.
v0.2.0 — Interactive Chat + Docker Compose Export
What's New
Interactive Terminal Chat
llmstack chatStream responses from your local LLM directly in the terminal. Supports conversation history, /clear to reset, Ctrl+C to quit.
Docker Compose Export
llmstack export
# Exported 7 services to docker-compose.yml
# Run with: docker compose up -dGenerate a standalone docker-compose.yml from your llmstack.yaml. Share with your team — no llmstack dependency required.
Bug Fixes
- Gateway Docker image now builds locally (no longer requires ghcr.io)
- Prometheus and Grafana configs are written to disk before container start
- Generated API keys persist to
llmstack.yamlacross restarts - Clear error messages for port conflicts
Stats
- 50 tests, 0 lint errors
- 8 CLI commands:
init,up,down,status,chat,export,logs,doctor
Install
pip install llmstack-cliFull docs: https://github.com/mara-werils/llmstack
v0.1.0 — Initial Release
llmstack v0.1.0 — Initial Release
One command. Full LLM stack. Zero config.
What's included
- CLI with
init,up,down,status,logs,doctorcommands - Auto hardware detection — NVIDIA, Apple Silicon, CPU
- Smart backend resolver — auto-picks Ollama or vLLM based on your GPU
- Services: Ollama, vLLM, Qdrant, Redis, TEI (Text Embeddings Inference)
- API Gateway: OpenAI-compatible proxy with auth, rate limiting, SSE streaming
- Observability: Prometheus + Grafana with pre-provisioned dashboard
- Plugin system via Python entry_points
- Presets:
chat,rag,agent - Pydantic v2 config schema (
llmstack.yaml) - Docker SDK orchestration (no docker-compose dependency)
- CI/CD: GitHub Actions for lint/test and PyPI release
Install
pip install llmstack-cliQuick start
llmstack init
llmstack up