Agent Offline is an OpenAI-compatible API middleware connecting LM Studio (or any local API) with users. Provides a fully offline REST API with RAG and MCP support.
| Traditional | Agent Offline | |
|---|---|---|
| API | Direct LM Studio calls | Call through Agent Offline |
| RAG | Manual integration | Built-in via use_rag |
| MCP | Not supported | Full support |
| Config | Edit .env file | Change directly in UI |
| Standard | Proprietary API | OpenAI-compatible |
- OpenAI API Standard: Compatible with
/v1/chat/completions,/v1/embeddings - Built-in RAG: Upload documents, automatic context retrieval
- MCP Client: Connect MCP servers to extend capabilities
- In-UI Config: Change settings without restarting
- Multi-language: Vietnamese / English
User → Agent Offline → LM Studio
↓
RAG + MCP
- User calls API to Agent Offline (port 3000)
- Agent Offline processes RAG (finds relevant documents)
- Agent Offline calls LM Studio with full context
- Returns result in OpenAI format
git clone <repo-url>
cd AgentOffline
npm install- Download LM Studio
- Download a model (Llama 3, Phi-3, Mistral...)
- Settings → Server → Enable Local Server
npm run devAccess:
- Frontend: http://localhost:5173
- Backend: http://localhost:3000
# Configure in UI (🔧 Settings tab)
# - LM Studio URL (default: http://localhost:1234/v1)
# - Model name
# Regular chat
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'
# Chat with RAG
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Question about documents"}], "use_rag": true}'
# Upload document for RAG
curl -X POST http://localhost:3000/api/documents/upload \
-F "file=@document.pdf"http://localhost:3000
| Endpoint | Description |
|---|---|
POST /v1/chat/completions |
Chat (OpenAI compatible) |
POST /v1/embeddings |
Create embedding |
POST /api/documents/upload |
Upload RAG document |
GET /api/documents |
List documents |
DELETE /api/documents/:id |
Delete document |
GET /api/health |
Health check |
GET /api/config |
Get configuration |
POST /api/config |
Update configuration |
{
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Question"}
],
"model": "optional-model-name",
"temperature": 0.7,
"max_tokens": 1000,
"use_rag": true,
"top_k": 5
}{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "local-model",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response..."
},
"finish_reason": "stop"
}]
}AgentOffline/
├── backend/
│ └── src/
│ ├── lib/
│ │ ├── config-store.ts # JSON config
│ │ ├── lm-client.ts # LM Studio client
│ │ ├── mcp-client.ts # MCP client
│ │ └── rag-service.ts # RAG processing
│ ├── routes/
│ │ ├── openai.ts # OpenAI API
│ │ ├── documents.ts # Document API
│ │ └── system.ts # System API
│ └── index.ts
├── frontend/
│ └── src/
│ ├── components/
│ │ ├── Settings.tsx # In-UI configuration
│ │ └── ...
│ └── App.tsx
├── data/
│ ├── config.json # Config file
│ └── uploads/ # Uploaded files
└── docker/
🔧 Settings tab allows direct changes:
LM Studio URL- LM Studio addressModel- Model name in useEmbedding URL- Embedding serviceEmbedding Model- Embedding modelRAG Top K- Number of context chunksRAG Context Max Chars- Context limitMCP Enabled- Toggle MCP
Changes are saved automatically and server reloads without manual restart.
cd docker
docker-compose up --build- Node.js 20+
- LM Studio (or other OpenAI-compatible server)
- Windows 10+ / macOS / Linux
MIT