Assistly — AI Café Assistant (Demo)

Assistly is a demo project that shows how to build a production-grade AI chat assistant for a café or similar hospitality business. It is not tied to a real business — the café details embedded in the knowledge base are sample data. The architecture is designed so any real business's information (menu, hours, location, policies, contact details) can be dropped in to replace the sample data, and the agent's full capability set works immediately without code changes.

The core idea: a small business can describe itself in a structured markdown knowledge base, define a lead-capture workflow, and get a capable conversational AI assistant that answers questions accurately, handles bookings, and persists qualified leads — without building any model infrastructure.

Project Structure

assistly/
├── server/                          # Express + LangGraph agent backend
│   ├── agent/                       # Prompt templates and markdown knowledge base
│   │   └── assistant/
│   │       ├── compact/             # Memory compaction prompt + extraction guide
│   │       └── main/                # System prompt templates, static sections, workflow components
│   │           ├── main/
│   │           │   ├── dynamic/     # active-memory.md (per-turn context shell)
│   │           │   └── static/      # assistant.md, business-knowledge.md, core-directives.md, …
│   │           └── workflow/
│   │               └── reservation/ # Slot guide, examples, per-section component files
│   ├── database/
│   │   └── leads.json               # Persisted reservation leads (auto-cleared every 30 min in demo mode)
│   └── src/
│       ├── agent/
│       │   ├── graph/               # LangGraph compilation: graph.ts, edges.ts
│       │   └── nodes/assistant/
│       │       ├── compact/         # Compaction node + compact_memory tool
│       │       ├── constants/       # Scenario and on-demand section identifiers
│       │       ├── tools/           # Tool schemas, executors, and scenario-scoped tool sets
│       │       ├── types/           # Shared TypeScript types (AgentState, ToolCallResult, …)
│       │       ├── utils/           # Section reader, prompt builder, path anchors, run logger
│       │       ├── workflow/        # Reservation section builders and slot constants
│       │       ├── agent.ts         # Public entry point: runAgent()
│       │       ├── assistant.node.ts
│       │       ├── state.ts
│       │       └── tool.node.ts
│       ├── config/env.config.ts
│       ├── routes/                  # chat.routes.ts, leads.routes.ts
│       └── app.ts
└── web/                             # React + Vite frontend (chat UI + dashboard)

Adapting to a Real Business

To replace the sample data with a real business, edit one file:

server/agent/assistant/main/main/static/business-knowledge.md

This file is the sole source of truth the LLM draws from when answering visitor questions. Replace the sample café details with the real business's:

Name, address, phone, email, and social media handles
Business hours
Full menu or service catalogue with prices
Available services and delivery options
Payment methods accepted
FAQ answers (WiFi, parking, policies, etc.)

No code changes are required. The agent's routing logic, tool system, reservation workflow, and memory model all operate against whatever content lives in this file.

The assistant identity (assistant.md) and system configuration (system-configuration.md) reference the business name and location — update those two files to match as well.

Agent Capabilities

1. Scenario-Aware System Prompt

The assistant operates in one of two scenarios at any given turn:

Scenario	Active When	Approximate Context Size
`IDLE`	Greeting, menu questions, hours, farewells	~1,200 tokens
`RESERVATION`	Event booking slot-collection in progress	~2,000 tokens

The system prompt is assembled from modular markdown sections at request time. Static sections (business knowledge, routing checklists, assistant identity) are loaded once at startup. Scenario-specific sections — slot-filling guide, confirmation gate rules, tool declarations, and few-shot examples — are injected only when RESERVATION is active. On IDLE turns these sections are structurally absent from the prompt, not replaced with empty strings, keeping context lean on the majority of turns.

Each scenario-configurable section (core directives, required behaviors, response format, and tool declarations) pre-compiles both IDLE and RESERVATION variants into a Record<Scenario, string> object at module initialisation. Resolving the system prompt for any turn is an O(1) in-memory lookup with no file I/O on the hot path.

2. ReAct Loop

The agent runs as a three-node LangGraph state graph:

START → assistantNode → (has tool calls?) → toolNode → assistantNode (loop)
                ↘ (no tool calls) → END

assistantNode builds the scenario-aware system prompt, binds the correct tool set, invokes the LLM, and fires the run logger on final turns.
toolNode executes all pending tool calls concurrently. A single tool failure becomes a ToolMessage the LLM can reason about on the next turn — it does not abort the batch.
shouldContinue is the conditional edge: routes back to toolNode when pending tool calls exist, to END when the response is final text.

The graph is compiled once at startup with a MemorySaver checkpointer. All sessions share one compiled graph instance; thread isolation is provided by thread_id keyed to the visitor's sessionId.

3. Memory Compaction

When conversation history reaches 20 messages, the agent runs a compaction cycle before appending the new user message. Compaction uses a dedicated LLM call with the compact_memory tool and a three-pass extraction protocol:

Pass 1 — Full Scan: Every visitor message is read. Tier 1 and Tier 2 facts are flagged by category. No values are written yet.

Tier	What is extracted
Tier 1 — Always	User name and email, all 6 reservation slots, corrections to any prior fact
Tier 2 — If future-relevant	Expressed product preferences, dietary restrictions, off-menu requests
Tier 3 — Never	Greetings, hours lookups, FAQ answers, assistant-side explanations, inferences

Pass 2 — Detail Pass: A focused re-scan targets the four values most prone to extraction error: email address (copied verbatim into both description and dialog[]), full name (exact capitalisation), guest count (final corrected value), and preferred date (exact phrase as stated — never converted to a calendar date).

Pass 3 — Merge Pass: Each candidate fact is reconciled against existing compact memory — update changed fields only, discard superseded values, never duplicate entries, never re-store facts already in static business knowledge.

After compaction, the 10 most recent messages are retained to give the model a visible conversational window. Compaction failures are non-fatal — history is preserved and the threshold re-triggers on the next turn.

4. Tool System

Tools are scoped to scenarios. Advertising a tool in the wrong scenario creates phantom tool-call risk.

Tool	Scenario	Trigger
`get_current_time`	Always	Visitor asks about open status, closing time, or current day
`set_scenario`	Always	Scenario transition: `IDLE` ↔ `RESERVATION`
`save_lead`	RESERVATION only	All 6 slots collected, confirmation summary shown in a prior turn, explicit visitor affirmative received in its own separate turn
`compact_memory`	Compaction call only	Never exposed during normal turns

Adding a new tool requires one new tool file, one case in the central dispatcher (tools/index.ts), and one entry in the appropriate tool set array.

5. Reservation Workflow

The reservation workflow collects 6 slots conversationally, one per message:

Slot	What is collected
`event_type`	Type of event (birthday party, corporate meeting, anniversary, etc.)
`guest_count`	Number of guests — maximum 30
`preferred_date`	Preferred date — must be at least 3 days in advance
`menu_preference`	Drink or food preferences; "No preference" is accepted
`name`	Visitor's full name
`email`	Visitor's email address — collected last

The workflow enforces a three-turn confirmation gate:

Display — After all 6 slots are collected, the assistant shows a markdown confirmation summary and stops. No tool call is made.
Affirmative — The visitor replies with an explicit confirmation in a separate turn.
Save — The assistant calls save_lead silently, then set_scenario('IDLE'), then delivers the closing message.

Displaying the summary and calling save_lead in the same turn — before the visitor replies — is a hard constraint violation enforced at the prompt, behavioral rules, and tool trigger levels simultaneously.

After save_lead succeeds, any compact memory entry tracking booking_status is patched immediately to completed, preventing stale in-progress state from surviving to the next compaction cycle.

6. Active Memory Injection

Every turn receives an <active_memory> block at the end of the system prompt:

Scenario — the current scenario identifier, which is the LLM's ground truth for workflow position.
Compact Memory — all entries from prior compaction cycles, formatted as titled markdown sections with key-value facts and verbatim dialog excerpts.
Abandonment reminder — injected only during RESERVATION turns; omitted on IDLE.

The LLM tracks which reservation slots have been collected using conversation history directly. Server-side slot state is not maintained — compact memory provides cross-session persistence; in-context messages provide within-session continuity.

7. Run Logging

When CONVERSATION_LOGS=true, three log files are written after every final-turn LLM response to server/logs/{n}/:

File	Contents
`context-injection.md`	Placeholder injection audit — which sections were injected or skipped, with component-level detail; DeepSeek prefix cache hit/miss token counts
`system-prompt.md`	Full rendered system prompt as the LLM received it
`conversation.md`	Full message history including tool calls and arguments; final plain-text response

Logging is fire-and-forget — failures are caught and logged to stderr without affecting the response path. Disabled by default.

Example Conversations

Example 1: Dynamic Hours, FAQs, and Policies

The assistant calls get_current_time() to answer the open-status question, cross-references the returned day and time against business hours in <business_knowledge>, and answers two unrelated follow-up questions (parking and outside cake policy) in a single response by pulling both facts directly from <business_knowledge> — no tool call required.

Example 2: Reservation with a Mid-Stream Correction

The assistant transitions to RESERVATION on the first booking intent, collects all 6 slots one per message, accepts a mid-flow guest-count correction (8 → 10) and discards the prior value, displays the full confirmation summary only after every slot is filled, and calls save_lead only after receiving an explicit affirmative in a separate turn — enforcing the three-turn confirmation gate throughout.

Example 3: Off-Domain Redirect & Workflow Abandonment

The assistant deflects a merchandise question outside <business_knowledge> to Facebook and the café phone number, then enters the RESERVATION workflow on request. When the visitor picks a date inside the 3-day advance booking window, the assistant enforces the policy and suggests an alternative. When the visitor decides to walk in instead, the assistant closes gracefully without calling save_lead or leaving the workflow open.

Environment Variables

Variable	Required	Default	Description
`DEEPSEEK_API_KEY`	Yes	—	API key for DeepSeek's OpenAI-compatible endpoint
`PORT`	No	`3000`	HTTP port the Express server listens on
`NODE_ENV`	No	`development`	Runtime environment
`CONVERSATION_LOGS`	No	`false`	Set to `true` to enable per-turn run logs under `server/logs/`

Copy server/.env.example to server/.env and fill in DEEPSEEK_API_KEY before starting.

Running the Server

cd server
npm install
npm run dev

The server starts on http://localhost:3000. Chat endpoint: POST /api/chat. Leads endpoint: GET /api/leads.

API Reference

`POST /api/chat`

{
  "message": "I'd like to book a private event",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000"
}

sessionId is a stable UUID generated once per browser session. If omitted, the server generates an ephemeral UUID for that request. Messages exceeding 300 characters cause the handler to throw, which Express 5 forwards to its default error handler as HTTP 500.

Response:

{ "reply": "I'd be happy to help you book a private event! ..." }

`GET /api/leads`

Returns the full array of saved reservation leads from database/leads.json. Used by the dashboard page. In demo mode, leads are cleared every 30 minutes.

Demo Mode

leads.json is auto-cleared every 30 minutes to prevent accumulation of test data during live demonstrations. Remove the setInterval block in src/app.ts to disable this.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
server		server
web		web
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assistly — AI Café Assistant (Demo)

Project Structure

Adapting to a Real Business

Agent Capabilities

1. Scenario-Aware System Prompt

2. ReAct Loop

3. Memory Compaction

4. Tool System

5. Reservation Workflow

6. Active Memory Injection

7. Run Logging

Example Conversations

Example 1: Dynamic Hours, FAQs, and Policies

Example 2: Reservation with a Mid-Stream Correction

Example 3: Off-Domain Redirect & Workflow Abandonment

Environment Variables

Running the Server

API Reference

`POST /api/chat`

`GET /api/leads`

Demo Mode

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assistly — AI Café Assistant (Demo)

Project Structure

Adapting to a Real Business

Agent Capabilities

1. Scenario-Aware System Prompt

2. ReAct Loop

3. Memory Compaction

4. Tool System

5. Reservation Workflow

6. Active Memory Injection

7. Run Logging

Example Conversations

Example 1: Dynamic Hours, FAQs, and Policies

Example 2: Reservation with a Mid-Stream Correction

Example 3: Off-Domain Redirect & Workflow Abandonment

Environment Variables

Running the Server

API Reference

POST /api/chat

GET /api/leads

Demo Mode

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/chat`

`GET /api/leads`

Packages