Planned features and improvements tracked here. Items are roughly prioritized by impact vs effort.
- Per-agent camera auth gate: only recognized faces can wake the agent
- Confidence threshold configurable per profile (currently hardcoded 50%)
- Multi-user household: register Mom, Dad, Child — each gets personalized greeting
- Age-appropriate agent access: child-safe agent only wakes for registered children
- Speaker verification / voice fingerprinting
- Voice-print enrollment flow (record 3–5 sentences, build profile)
- Can gate wake activation to recognized voices only
- Pairs with camera auth for dual-factor "biometric wake"
- Per-agent user management panel in Admin Dashboard
- Role-based access: owner / family / guest
- Registered users (under a Clerk account) can each have:
- Face photo(s)
- Voice print
- Preferred agent(s)
- Personalized greeting
- Custom wake word
- Admin can set which agents each user can activate
- Require re-auth for sensitive tool calls (banking, home control, etc.)
- Session expiry + re-auth prompt
- "Lock" the agent mid-conversation
- Greeting by recognized name: "Hey Dad, what's up?"
- Per-person conversation history / preferences stored server-side
- Agent can remember each person's preferences across sessions
- Profile switching based on who's recognized (Mom prefers different agent than Child)
- Passive camera monitoring: detect when someone approaches
- Auto-wake when registered person detected (no voice required)
- "Away mode" when no one recognized for N minutes
- Replace current LLM-based face matching with proper biometric library
(e.g.
deepface,insightface, orface_recognition+ dlib) - Faster (local, no API call) and more accurate
- Enrollment: capture multiple angles, generate face embedding vector
- Recognition: cosine similarity against embedding database
- Reduces recognition from ~3s (LLM API) to ~100ms (local)
- Object detection and tracking (YOLO)
- Emotion detection from camera feed → affect agent mood/tone
- Gesture recognition (wave to wake, thumbs up to confirm, etc.)
- Document/whiteboard reading
- QR code / barcode scanning via camera
- Dedicated admin panel tab: "Users & Faces"
- Add/remove household users
- Capture or upload multiple face photos per user
- Test recognition live in admin
- Set per-user permissions and preferred agents
- Per-agent: allowed users, blocked users
- Time-of-day restrictions ("kid agent" only 7am–9pm)
- Conversation log per user
- Silence timeout slider
- Continuous vs PTT toggle
- Wake word testing (live test button)
- Language/accent selection
- Detect spoken language automatically
- Switch TTS voice language to match
- Per-user preferred language
- Long-term memory across sessions (summaries, preferences, facts)
- "Remember that I like..." → stored in user profile
- Briefing on session start: "Last time you asked about..."
- Smarter interruption detection (voice activity vs noise)
- "Hold on" / pause command
- Resume from where it left off after interruption
- OAuth login per user
- Play from personal library
- Playlist control
- "Play my morning playlist" → knows which user asked
- Learn per-user taste
- "Play something like what I usually like"
- Genre/mood matching from conversation
- Home Assistant / MQTT bridge
- Control lights, locks, thermostats by voice
- Presence-triggered automations (arrive home → turn on lights)
- Per-user automations (Dad arrives → different scene than Mom)
- Ollama + LLaVA for fully offline vision processing
- No API key required, no cost
- ~1–2s latency on modern hardware
- Run separate voice UI instances per room
- Central admin manages all instances
- Shared user/face database across instances
- Items marked with no priority number are longer-term / post-v1
- Camera auth / visual auth is the highest-priority future auth feature
- The current face recognition (LLM-based) is intentionally temporary —
upgrade path is to swap
routes/vision.py's_call_vision()for local biometric comparison once a library is selected - Clerk handles INTERFACE auth; the user/face system above handles CONVERSATION-LEVEL and AGENT-LEVEL auth (different layers)