Skip to content

johnlester-0369/assistly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assistly — AI Café Assistant (Demo)

Assistly is a demo project that shows how to build a production-grade AI chat assistant for a café or similar hospitality business. It is not tied to a real business — the café details embedded in the knowledge base are sample data. The architecture is designed so any real business's information (menu, hours, location, policies, contact details) can be dropped in to replace the sample data, and the agent's full capability set works immediately without code changes.

The core idea: a small business can describe itself in a structured markdown knowledge base, define a lead-capture workflow, and get a capable conversational AI assistant that answers questions accurately, handles bookings, and persists qualified leads — without building any model infrastructure.


Project Structure

assistly/
├── server/                          # Express + LangGraph agent backend
│   ├── agent/                       # Prompt templates and markdown knowledge base
│   │   └── assistant/
│   │       ├── compact/             # Memory compaction prompt + extraction guide
│   │       └── main/                # System prompt templates, static sections, workflow components
│   │           ├── main/
│   │           │   ├── dynamic/     # active-memory.md (per-turn context shell)
│   │           │   └── static/      # assistant.md, business-knowledge.md, core-directives.md, …
│   │           └── workflow/
│   │               └── reservation/ # Slot guide, examples, per-section component files
│   ├── database/
│   │   └── leads.json               # Persisted reservation leads (auto-cleared every 30 min in demo mode)
│   └── src/
│       ├── agent/
│       │   ├── graph/               # LangGraph compilation: graph.ts, edges.ts
│       │   └── nodes/assistant/
│       │       ├── compact/         # Compaction node + compact_memory tool
│       │       ├── constants/       # Scenario and on-demand section identifiers
│       │       ├── tools/           # Tool schemas, executors, and scenario-scoped tool sets
│       │       ├── types/           # Shared TypeScript types (AgentState, ToolCallResult, …)
│       │       ├── utils/           # Section reader, prompt builder, path anchors, run logger
│       │       ├── workflow/        # Reservation section builders and slot constants
│       │       ├── agent.ts         # Public entry point: runAgent()
│       │       ├── assistant.node.ts
│       │       ├── state.ts
│       │       └── tool.node.ts
│       ├── config/env.config.ts
│       ├── routes/                  # chat.routes.ts, leads.routes.ts
│       └── app.ts
└── web/                             # React + Vite frontend (chat UI + dashboard)

Adapting to a Real Business

To replace the sample data with a real business, edit one file:

server/agent/assistant/main/main/static/business-knowledge.md

This file is the sole source of truth the LLM draws from when answering visitor questions. Replace the sample café details with the real business's:

  • Name, address, phone, email, and social media handles
  • Business hours
  • Full menu or service catalogue with prices
  • Available services and delivery options
  • Payment methods accepted
  • FAQ answers (WiFi, parking, policies, etc.)

No code changes are required. The agent's routing logic, tool system, reservation workflow, and memory model all operate against whatever content lives in this file.

The assistant identity (assistant.md) and system configuration (system-configuration.md) reference the business name and location — update those two files to match as well.


Agent Capabilities

1. Scenario-Aware System Prompt

The assistant operates in one of two scenarios at any given turn:

Scenario Active When Approximate Context Size
IDLE Greeting, menu questions, hours, farewells ~1,200 tokens
RESERVATION Event booking slot-collection in progress ~2,000 tokens

The system prompt is assembled from modular markdown sections at request time. Static sections (business knowledge, routing checklists, assistant identity) are loaded once at startup. Scenario-specific sections — slot-filling guide, confirmation gate rules, tool declarations, and few-shot examples — are injected only when RESERVATION is active. On IDLE turns these sections are structurally absent from the prompt, not replaced with empty strings, keeping context lean on the majority of turns.

Each scenario-configurable section (core directives, required behaviors, response format, and tool declarations) pre-compiles both IDLE and RESERVATION variants into a Record<Scenario, string> object at module initialisation. Resolving the system prompt for any turn is an O(1) in-memory lookup with no file I/O on the hot path.


2. ReAct Loop

The agent runs as a three-node LangGraph state graph:

START → assistantNode → (has tool calls?) → toolNode → assistantNode (loop)
                ↘ (no tool calls) → END
  • assistantNode builds the scenario-aware system prompt, binds the correct tool set, invokes the LLM, and fires the run logger on final turns.
  • toolNode executes all pending tool calls concurrently. A single tool failure becomes a ToolMessage the LLM can reason about on the next turn — it does not abort the batch.
  • shouldContinue is the conditional edge: routes back to toolNode when pending tool calls exist, to END when the response is final text.

The graph is compiled once at startup with a MemorySaver checkpointer. All sessions share one compiled graph instance; thread isolation is provided by thread_id keyed to the visitor's sessionId.


3. Memory Compaction

When conversation history reaches 20 messages, the agent runs a compaction cycle before appending the new user message. Compaction uses a dedicated LLM call with the compact_memory tool and a three-pass extraction protocol:

Pass 1 — Full Scan: Every visitor message is read. Tier 1 and Tier 2 facts are flagged by category. No values are written yet.

Tier What is extracted
Tier 1 — Always User name and email, all 6 reservation slots, corrections to any prior fact
Tier 2 — If future-relevant Expressed product preferences, dietary restrictions, off-menu requests
Tier 3 — Never Greetings, hours lookups, FAQ answers, assistant-side explanations, inferences

Pass 2 — Detail Pass: A focused re-scan targets the four values most prone to extraction error: email address (copied verbatim into both description and dialog[]), full name (exact capitalisation), guest count (final corrected value), and preferred date (exact phrase as stated — never converted to a calendar date).

Pass 3 — Merge Pass: Each candidate fact is reconciled against existing compact memory — update changed fields only, discard superseded values, never duplicate entries, never re-store facts already in static business knowledge.

After compaction, the 10 most recent messages are retained to give the model a visible conversational window. Compaction failures are non-fatal — history is preserved and the threshold re-triggers on the next turn.


4. Tool System

Tools are scoped to scenarios. Advertising a tool in the wrong scenario creates phantom tool-call risk.

Tool Scenario Trigger
get_current_time Always Visitor asks about open status, closing time, or current day
set_scenario Always Scenario transition: IDLERESERVATION
save_lead RESERVATION only All 6 slots collected, confirmation summary shown in a prior turn, explicit visitor affirmative received in its own separate turn
compact_memory Compaction call only Never exposed during normal turns

Adding a new tool requires one new tool file, one case in the central dispatcher (tools/index.ts), and one entry in the appropriate tool set array.


5. Reservation Workflow

The reservation workflow collects 6 slots conversationally, one per message:

Slot What is collected
event_type Type of event (birthday party, corporate meeting, anniversary, etc.)
guest_count Number of guests — maximum 30
preferred_date Preferred date — must be at least 3 days in advance
menu_preference Drink or food preferences; "No preference" is accepted
name Visitor's full name
email Visitor's email address — collected last

The workflow enforces a three-turn confirmation gate:

  1. Display — After all 6 slots are collected, the assistant shows a markdown confirmation summary and stops. No tool call is made.
  2. Affirmative — The visitor replies with an explicit confirmation in a separate turn.
  3. Save — The assistant calls save_lead silently, then set_scenario('IDLE'), then delivers the closing message.

Displaying the summary and calling save_lead in the same turn — before the visitor replies — is a hard constraint violation enforced at the prompt, behavioral rules, and tool trigger levels simultaneously.

After save_lead succeeds, any compact memory entry tracking booking_status is patched immediately to completed, preventing stale in-progress state from surviving to the next compaction cycle.


6. Active Memory Injection

Every turn receives an <active_memory> block at the end of the system prompt:

  • Scenario — the current scenario identifier, which is the LLM's ground truth for workflow position.
  • Compact Memory — all entries from prior compaction cycles, formatted as titled markdown sections with key-value facts and verbatim dialog excerpts.
  • Abandonment reminder — injected only during RESERVATION turns; omitted on IDLE.

The LLM tracks which reservation slots have been collected using conversation history directly. Server-side slot state is not maintained — compact memory provides cross-session persistence; in-context messages provide within-session continuity.


7. Run Logging

When CONVERSATION_LOGS=true, three log files are written after every final-turn LLM response to server/logs/{n}/:

File Contents
context-injection.md Placeholder injection audit — which sections were injected or skipped, with component-level detail; DeepSeek prefix cache hit/miss token counts
system-prompt.md Full rendered system prompt as the LLM received it
conversation.md Full message history including tool calls and arguments; final plain-text response

Logging is fire-and-forget — failures are caught and logged to stderr without affecting the response path. Disabled by default.


Example Conversations

Example 1: Dynamic Hours, FAQs, and Policies

Example 1: Dynamic Hours, FAQs, and Policies

The assistant calls get_current_time() to answer the open-status question, cross-references the returned day and time against business hours in <business_knowledge>, and answers two unrelated follow-up questions (parking and outside cake policy) in a single response by pulling both facts directly from <business_knowledge> — no tool call required.


Example 2: Reservation with a Mid-Stream Correction

Example 2: Reservation with a Mid-Stream Correction

The assistant transitions to RESERVATION on the first booking intent, collects all 6 slots one per message, accepts a mid-flow guest-count correction (8 → 10) and discards the prior value, displays the full confirmation summary only after every slot is filled, and calls save_lead only after receiving an explicit affirmative in a separate turn — enforcing the three-turn confirmation gate throughout.


Example 3: Off-Domain Redirect & Workflow Abandonment

Example 3: Off-Domain Redirect & Workflow Abandonment

The assistant deflects a merchandise question outside <business_knowledge> to Facebook and the café phone number, then enters the RESERVATION workflow on request. When the visitor picks a date inside the 3-day advance booking window, the assistant enforces the policy and suggests an alternative. When the visitor decides to walk in instead, the assistant closes gracefully without calling save_lead or leaving the workflow open.


Environment Variables

Variable Required Default Description
DEEPSEEK_API_KEY Yes API key for DeepSeek's OpenAI-compatible endpoint
PORT No 3000 HTTP port the Express server listens on
NODE_ENV No development Runtime environment
CONVERSATION_LOGS No false Set to true to enable per-turn run logs under server/logs/

Copy server/.env.example to server/.env and fill in DEEPSEEK_API_KEY before starting.


Running the Server

cd server
npm install
npm run dev

The server starts on http://localhost:3000. Chat endpoint: POST /api/chat. Leads endpoint: GET /api/leads.


API Reference

POST /api/chat

{
  "message": "I'd like to book a private event",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000"
}

sessionId is a stable UUID generated once per browser session. If omitted, the server generates an ephemeral UUID for that request. Messages exceeding 300 characters cause the handler to throw, which Express 5 forwards to its default error handler as HTTP 500.

Response:

{ "reply": "I'd be happy to help you book a private event! ..." }

GET /api/leads

Returns the full array of saved reservation leads from database/leads.json. Used by the dashboard page. In demo mode, leads are cleared every 30 minutes.


Demo Mode

leads.json is auto-cleared every 30 minutes to prevent accumulation of test data during live demonstrations. Remove the setInterval block in src/app.ts to disable this.

About

Production-grade AI café assistant, Express + LangGraph ReAct agent with scenario-aware prompt assembly, memory compaction, and a structured lead-capture reservation workflow. Powered by DeepSeek, TypeScript, and React.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages