Assistly is a demo project that shows how to build a production-grade AI chat assistant for a café or similar hospitality business. It is not tied to a real business — the café details embedded in the knowledge base are sample data. The architecture is designed so any real business's information (menu, hours, location, policies, contact details) can be dropped in to replace the sample data, and the agent's full capability set works immediately without code changes.
The core idea: a small business can describe itself in a structured markdown knowledge base, define a lead-capture workflow, and get a capable conversational AI assistant that answers questions accurately, handles bookings, and persists qualified leads — without building any model infrastructure.
assistly/
├── server/ # Express + LangGraph agent backend
│ ├── agent/ # Prompt templates and markdown knowledge base
│ │ └── assistant/
│ │ ├── compact/ # Memory compaction prompt + extraction guide
│ │ └── main/ # System prompt templates, static sections, workflow components
│ │ ├── main/
│ │ │ ├── dynamic/ # active-memory.md (per-turn context shell)
│ │ │ └── static/ # assistant.md, business-knowledge.md, core-directives.md, …
│ │ └── workflow/
│ │ └── reservation/ # Slot guide, examples, per-section component files
│ ├── database/
│ │ └── leads.json # Persisted reservation leads (auto-cleared every 30 min in demo mode)
│ └── src/
│ ├── agent/
│ │ ├── graph/ # LangGraph compilation: graph.ts, edges.ts
│ │ └── nodes/assistant/
│ │ ├── compact/ # Compaction node + compact_memory tool
│ │ ├── constants/ # Scenario and on-demand section identifiers
│ │ ├── tools/ # Tool schemas, executors, and scenario-scoped tool sets
│ │ ├── types/ # Shared TypeScript types (AgentState, ToolCallResult, …)
│ │ ├── utils/ # Section reader, prompt builder, path anchors, run logger
│ │ ├── workflow/ # Reservation section builders and slot constants
│ │ ├── agent.ts # Public entry point: runAgent()
│ │ ├── assistant.node.ts
│ │ ├── state.ts
│ │ └── tool.node.ts
│ ├── config/env.config.ts
│ ├── routes/ # chat.routes.ts, leads.routes.ts
│ └── app.ts
└── web/ # React + Vite frontend (chat UI + dashboard)
To replace the sample data with a real business, edit one file:
server/agent/assistant/main/main/static/business-knowledge.md
This file is the sole source of truth the LLM draws from when answering visitor questions. Replace the sample café details with the real business's:
- Name, address, phone, email, and social media handles
- Business hours
- Full menu or service catalogue with prices
- Available services and delivery options
- Payment methods accepted
- FAQ answers (WiFi, parking, policies, etc.)
No code changes are required. The agent's routing logic, tool system, reservation workflow, and memory model all operate against whatever content lives in this file.
The assistant identity (assistant.md) and system configuration (system-configuration.md) reference the business name and location — update those two files to match as well.
The assistant operates in one of two scenarios at any given turn:
| Scenario | Active When | Approximate Context Size |
|---|---|---|
IDLE |
Greeting, menu questions, hours, farewells | ~1,200 tokens |
RESERVATION |
Event booking slot-collection in progress | ~2,000 tokens |
The system prompt is assembled from modular markdown sections at request time. Static sections (business knowledge, routing checklists, assistant identity) are loaded once at startup. Scenario-specific sections — slot-filling guide, confirmation gate rules, tool declarations, and few-shot examples — are injected only when RESERVATION is active. On IDLE turns these sections are structurally absent from the prompt, not replaced with empty strings, keeping context lean on the majority of turns.
Each scenario-configurable section (core directives, required behaviors, response format, and tool declarations) pre-compiles both IDLE and RESERVATION variants into a Record<Scenario, string> object at module initialisation. Resolving the system prompt for any turn is an O(1) in-memory lookup with no file I/O on the hot path.
The agent runs as a three-node LangGraph state graph:
START → assistantNode → (has tool calls?) → toolNode → assistantNode (loop)
↘ (no tool calls) → END
- assistantNode builds the scenario-aware system prompt, binds the correct tool set, invokes the LLM, and fires the run logger on final turns.
- toolNode executes all pending tool calls concurrently. A single tool failure becomes a
ToolMessagethe LLM can reason about on the next turn — it does not abort the batch. - shouldContinue is the conditional edge: routes back to
toolNodewhen pending tool calls exist, toENDwhen the response is final text.
The graph is compiled once at startup with a MemorySaver checkpointer. All sessions share one compiled graph instance; thread isolation is provided by thread_id keyed to the visitor's sessionId.
When conversation history reaches 20 messages, the agent runs a compaction cycle before appending the new user message. Compaction uses a dedicated LLM call with the compact_memory tool and a three-pass extraction protocol:
Pass 1 — Full Scan: Every visitor message is read. Tier 1 and Tier 2 facts are flagged by category. No values are written yet.
| Tier | What is extracted |
|---|---|
| Tier 1 — Always | User name and email, all 6 reservation slots, corrections to any prior fact |
| Tier 2 — If future-relevant | Expressed product preferences, dietary restrictions, off-menu requests |
| Tier 3 — Never | Greetings, hours lookups, FAQ answers, assistant-side explanations, inferences |
Pass 2 — Detail Pass: A focused re-scan targets the four values most prone to extraction error: email address (copied verbatim into both description and dialog[]), full name (exact capitalisation), guest count (final corrected value), and preferred date (exact phrase as stated — never converted to a calendar date).
Pass 3 — Merge Pass: Each candidate fact is reconciled against existing compact memory — update changed fields only, discard superseded values, never duplicate entries, never re-store facts already in static business knowledge.
After compaction, the 10 most recent messages are retained to give the model a visible conversational window. Compaction failures are non-fatal — history is preserved and the threshold re-triggers on the next turn.
Tools are scoped to scenarios. Advertising a tool in the wrong scenario creates phantom tool-call risk.
| Tool | Scenario | Trigger |
|---|---|---|
get_current_time |
Always | Visitor asks about open status, closing time, or current day |
set_scenario |
Always | Scenario transition: IDLE ↔ RESERVATION |
save_lead |
RESERVATION only | All 6 slots collected, confirmation summary shown in a prior turn, explicit visitor affirmative received in its own separate turn |
compact_memory |
Compaction call only | Never exposed during normal turns |
Adding a new tool requires one new tool file, one case in the central dispatcher (tools/index.ts), and one entry in the appropriate tool set array.
The reservation workflow collects 6 slots conversationally, one per message:
| Slot | What is collected |
|---|---|
event_type |
Type of event (birthday party, corporate meeting, anniversary, etc.) |
guest_count |
Number of guests — maximum 30 |
preferred_date |
Preferred date — must be at least 3 days in advance |
menu_preference |
Drink or food preferences; "No preference" is accepted |
name |
Visitor's full name |
email |
Visitor's email address — collected last |
The workflow enforces a three-turn confirmation gate:
- Display — After all 6 slots are collected, the assistant shows a markdown confirmation summary and stops. No tool call is made.
- Affirmative — The visitor replies with an explicit confirmation in a separate turn.
- Save — The assistant calls
save_leadsilently, thenset_scenario('IDLE'), then delivers the closing message.
Displaying the summary and calling save_lead in the same turn — before the visitor replies — is a hard constraint violation enforced at the prompt, behavioral rules, and tool trigger levels simultaneously.
After save_lead succeeds, any compact memory entry tracking booking_status is patched immediately to completed, preventing stale in-progress state from surviving to the next compaction cycle.
Every turn receives an <active_memory> block at the end of the system prompt:
- Scenario — the current scenario identifier, which is the LLM's ground truth for workflow position.
- Compact Memory — all entries from prior compaction cycles, formatted as titled markdown sections with key-value facts and verbatim dialog excerpts.
- Abandonment reminder — injected only during
RESERVATIONturns; omitted on IDLE.
The LLM tracks which reservation slots have been collected using conversation history directly. Server-side slot state is not maintained — compact memory provides cross-session persistence; in-context messages provide within-session continuity.
When CONVERSATION_LOGS=true, three log files are written after every final-turn LLM response to server/logs/{n}/:
| File | Contents |
|---|---|
context-injection.md |
Placeholder injection audit — which sections were injected or skipped, with component-level detail; DeepSeek prefix cache hit/miss token counts |
system-prompt.md |
Full rendered system prompt as the LLM received it |
conversation.md |
Full message history including tool calls and arguments; final plain-text response |
Logging is fire-and-forget — failures are caught and logged to stderr without affecting the response path. Disabled by default.
The assistant calls get_current_time() to answer the open-status question, cross-references the returned day and time against business hours in <business_knowledge>, and answers two unrelated follow-up questions (parking and outside cake policy) in a single response by pulling both facts directly from <business_knowledge> — no tool call required.
The assistant transitions to RESERVATION on the first booking intent, collects all 6 slots one per message, accepts a mid-flow guest-count correction (8 → 10) and discards the prior value, displays the full confirmation summary only after every slot is filled, and calls save_lead only after receiving an explicit affirmative in a separate turn — enforcing the three-turn confirmation gate throughout.
The assistant deflects a merchandise question outside <business_knowledge> to Facebook and the café phone number, then enters the RESERVATION workflow on request. When the visitor picks a date inside the 3-day advance booking window, the assistant enforces the policy and suggests an alternative. When the visitor decides to walk in instead, the assistant closes gracefully without calling save_lead or leaving the workflow open.
| Variable | Required | Default | Description |
|---|---|---|---|
DEEPSEEK_API_KEY |
Yes | — | API key for DeepSeek's OpenAI-compatible endpoint |
PORT |
No | 3000 |
HTTP port the Express server listens on |
NODE_ENV |
No | development |
Runtime environment |
CONVERSATION_LOGS |
No | false |
Set to true to enable per-turn run logs under server/logs/ |
Copy server/.env.example to server/.env and fill in DEEPSEEK_API_KEY before starting.
cd server
npm install
npm run devThe server starts on http://localhost:3000. Chat endpoint: POST /api/chat. Leads endpoint: GET /api/leads.
{
"message": "I'd like to book a private event",
"sessionId": "550e8400-e29b-41d4-a716-446655440000"
}sessionId is a stable UUID generated once per browser session. If omitted, the server generates an ephemeral UUID for that request. Messages exceeding 300 characters cause the handler to throw, which Express 5 forwards to its default error handler as HTTP 500.
Response:
{ "reply": "I'd be happy to help you book a private event! ..." }Returns the full array of saved reservation leads from database/leads.json. Used by the dashboard page. In demo mode, leads are cleared every 30 minutes.
leads.json is auto-cleared every 30 minutes to prevent accumulation of test data during live demonstrations. Remove the setInterval block in src/app.ts to disable this.


