Skip to content

Latest commit

 

History

History
161 lines (122 loc) · 5 KB

File metadata and controls

161 lines (122 loc) · 5 KB

OpenVoiceUI — Future Developments

Planned features and improvements tracked here. Items are roughly prioritized by impact vs effort.


🔐 Authentication & Access Control

Visual Authentication (Camera-Based)

  • Per-agent camera auth gate: only recognized faces can wake the agent
  • Confidence threshold configurable per profile (currently hardcoded 50%)
  • Multi-user household: register Mom, Dad, Child — each gets personalized greeting
  • Age-appropriate agent access: child-safe agent only wakes for registered children

Voice Authentication

  • Speaker verification / voice fingerprinting
  • Voice-print enrollment flow (record 3–5 sentences, build profile)
  • Can gate wake activation to recognized voices only
  • Pairs with camera auth for dual-factor "biometric wake"

Full User Credential System

  • Per-agent user management panel in Admin Dashboard
  • Role-based access: owner / family / guest
  • Registered users (under a Clerk account) can each have:
    • Face photo(s)
    • Voice print
    • Preferred agent(s)
    • Personalized greeting
    • Custom wake word
  • Admin can set which agents each user can activate

Conversation-Level Auth

  • Require re-auth for sensitive tool calls (banking, home control, etc.)
  • Session expiry + re-auth prompt
  • "Lock" the agent mid-conversation

👥 Multi-User Household Features

Person-Specific Personalization

  • Greeting by recognized name: "Hey Dad, what's up?"
  • Per-person conversation history / preferences stored server-side
  • Agent can remember each person's preferences across sessions
  • Profile switching based on who's recognized (Mom prefers different agent than Child)

Presence Detection

  • Passive camera monitoring: detect when someone approaches
  • Auto-wake when registered person detected (no voice required)
  • "Away mode" when no one recognized for N minutes

🎥 Vision System

Biometric Face Recognition Library

  • Replace current LLM-based face matching with proper biometric library (e.g. deepface, insightface, or face_recognition + dlib)
  • Faster (local, no API call) and more accurate
  • Enrollment: capture multiple angles, generate face embedding vector
  • Recognition: cosine similarity against embedding database
  • Reduces recognition from ~3s (LLM API) to ~100ms (local)

Advanced Vision Capabilities

  • Object detection and tracking (YOLO)
  • Emotion detection from camera feed → affect agent mood/tone
  • Gesture recognition (wave to wake, thumbs up to confirm, etc.)
  • Document/whiteboard reading
  • QR code / barcode scanning via camera

🔧 Admin Dashboard

Face User Management Panel

  • Dedicated admin panel tab: "Users & Faces"
  • Add/remove household users
  • Capture or upload multiple face photos per user
  • Test recognition live in admin
  • Set per-user permissions and preferred agents

Agent Access Control Panel

  • Per-agent: allowed users, blocked users
  • Time-of-day restrictions ("kid agent" only 7am–9pm)
  • Conversation log per user

Full STT Settings Panel

  • Silence timeout slider
  • Continuous vs PTT toggle
  • Wake word testing (live test button)
  • Language/accent selection

🗣️ Voice & Conversation

Multi-Language Support

  • Detect spoken language automatically
  • Switch TTS voice language to match
  • Per-user preferred language

Conversation Memory

  • Long-term memory across sessions (summaries, preferences, facts)
  • "Remember that I like..." → stored in user profile
  • Briefing on session start: "Last time you asked about..."

Interruption & Barge-In Polish

  • Smarter interruption detection (voice activity vs noise)
  • "Hold on" / pause command
  • Resume from where it left off after interruption

🎵 Music & Media

Spotify Integration (Full)

  • OAuth login per user
  • Play from personal library
  • Playlist control
  • "Play my morning playlist" → knows which user asked

Music Recommendations

  • Learn per-user taste
  • "Play something like what I usually like"
  • Genre/mood matching from conversation

🏠 Smart Home & IoT

Home Automation Integration

  • Home Assistant / MQTT bridge
  • Control lights, locks, thermostats by voice
  • Presence-triggered automations (arrive home → turn on lights)
  • Per-user automations (Dad arrives → different scene than Mom)

🛠️ Infrastructure

Local Vision Model Option

  • Ollama + LLaVA for fully offline vision processing
  • No API key required, no cost
  • ~1–2s latency on modern hardware

Multi-Instance / Multi-Room

  • Run separate voice UI instances per room
  • Central admin manages all instances
  • Shared user/face database across instances

Notes

  • Items marked with no priority number are longer-term / post-v1
  • Camera auth / visual auth is the highest-priority future auth feature
  • The current face recognition (LLM-based) is intentionally temporary — upgrade path is to swap routes/vision.py's _call_vision() for local biometric comparison once a library is selected
  • Clerk handles INTERFACE auth; the user/face system above handles CONVERSATION-LEVEL and AGENT-LEVEL auth (different layers)