feat(voice): Phase 3 fast command router (7/8 of #3307)#3346
Conversation
Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
…trator Registers the AutomateTool (multi-step UI flows in one call) and the ax_interact denylist/opt-in plumbing; adds the catalog toggle, tool definition, and orchestrator prompt guidance (automate + screenshot/ mouse/keyboard fallback for Electron apps with empty AX trees). Slice 3/7 of tinyhumansai#3307 (tool wiring + prompts).
Continuous cpal mic → VAD segmenter → STT → agent with no hotkey, opt-in via voice_server.always_on_enabled, 'Hey Tiny' wake word (English-forced STT + fuzzy match), and screen-lock privacy pause. Adds the config schema, live-apply on the settings RPC, start_if_enabled wiring, and a JSON-RPC roundtrip E2E. Slice 4/7 of tinyhumansai#3307 (always-on core).
Surfaces the always-on listening toggle in the reachable Voice panel, adds the VoiceDebugPanel, the voice tauri-command wrapper, and the RPC client method. Adds all voice.debug.* and notch.* i18n keys across the 14 locales (notch keys land here as inert strings; the notch UI that consumes them ships in slice 6). Slice 5/7 of tinyhumansai#3307 (always-on frontend).
Transparent NSPanel + WKWebView anchored at the top-centre of the primary screen showing live Ready/Listening/Processing state; automate streams step progress to it via the overlay:attention socket bridge. macOS only; no-op elsewhere. Slice 6/7 of tinyhumansai#3307 (notch status pill).
Routes always-on utterances through a fast intent classifier before the chat model, wired into always-on delivery; ties the notch indicator visibility to always-on listening. Adds the window tauri-command wrapper and the core-process permission entry. Slice 7/7 of tinyhumansai#3307 (Phase 3 fast routing).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughImplements a local voice intent router, integrates it into always-on wake-word delivery with local execution fallbacks, adds macOS-safe Tauri notch controls, and syncs notch visibility from frontend boot and settings toggles. ChangesAlways-On Voice Command Listening & Fast Routing
Sequence Diagram(s)sequenceDiagram
participant User
participant Audio as Audio Capture
participant Router as Command Router
participant Intent as Intent Executor
participant Agent as Agent/LLM
participant Action as System Action
participant Notch as Notch Indicator
User->>Audio: speaks command
Audio->>Router: transcribed text
Router->>Router: classify intent
alt High-Confidence Intent
Router->>Intent: execute action
Intent->>Action: run command
Intent->>Notch: update status
Action-->>User: action complete
else Unknown Intent
Router->>Agent: publish_transcription
Agent->>Action: LLM dispatch
Action-->>User: action + explanation
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…ps (Phase 1.5)
Adds a model-chosen `vision_click { description }` action to the `automate`
loop for apps that expose no usable accessibility tree (Slack, Discord,
VS Code). Flow: screenshot the app window -> ask the main vision model for the
target's pixel coordinates (via the existing `[IMAGE:]` marker path) -> map
image pixels to absolute screen points -> guarded left-click.
- New `accessibility/vision_click.rs`: pure `image_to_screen` coordinate
transform (folds in the deferred F2 mapping -- the px->pt ratio absorbs the
capture downscale + Retina backing scale), tolerant locate-response parser,
capture geometry, and the main-thread guarded click (`run_input_on_main`,
Change 1.15).
- Section 1.8 safety guard: only clicks when the target app is frontmost;
refuses on positive evidence another app is focused, so synthetic input never
lands on OpenHuman's own CEF window.
- Reuses the main `chat` vision provider -- no new inference API, no new tool,
no new approval surface (inherits `automate`'s Dangerous + mutations gate).
- 19 new unit tests (pure transform/parse + scripted-backend loop integration,
incl. the frontmost-refusal guard). All 25 automate + vision_click tests green.
Closes the last open Phase 1.5 item (tinyhumansai#3148). Stacks on tinyhumansai#3340-tinyhumansai#3346.
Combine main + phase3: keep main's stop()/catch_unwind/notch-fix versions; keep phase3's command_router routing (deliver_command/execute_intent), syncNotchVisibility toggle, and notch main-thread dispatch (the notch is now frontend-owned, not auto-shown).
…lGate Ship the orchestrator's desktop-control playbook (carried from the original voice work) with the gating line corrected: mouse/keyboard now route through the ApprovalGate (tinyhumansai#3342), so the prompt no longer claims they 'run without an approval prompt'. Resolves the deferred follow-up.
Independent review (beyond the CodeRabbit pass)Final slice — reviewed the Phase 3 fast command router ( Reviewed clean
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/components/settings/panels/VoicePanel.tsx (1)
516-524:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDecouple notch-sync failure from always-on persistence rollback.
On Line 518,
syncNotchVisibility(next)is in the sametryas the settings RPC. If notch sync fails after persistence succeeds, thecatch(Lines 520-523) reverts the toggle and shows a false state even thoughalways_on_enabledwas saved.💡 Suggested fix
onClick={async () => { const next = !settings.always_on_enabled; setSettings(current => current ? { ...current, always_on_enabled: next } : current ); try { await openhumanUpdateVoiceServerSettings({ always_on_enabled: next }); - // The notch pill is the always-on listening HUD: show it - // when listening is enabled, drop it when disabled. - await syncNotchVisibility(next); + // Best-effort UI sync: do not rollback persisted setting + // if notch visibility update fails. + try { + await syncNotchVisibility(next); + } catch { + // no-op + } } catch (err) { // Revert on failure so the UI reflects the persisted value. setSettings(current => current ? { ...current, always_on_enabled: !next } : current ); console.error('[VoicePanel] failed to toggle always-on', err); } }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 516 - 524, The current try/catch groups the settings RPC and syncNotchVisibility(next) so a notch-sync failure can wrongly trigger the rollback via setSettings; split the operations: await the settings persistence (the RPC that updates always_on_enabled) inside its own try/catch and only call setSettings(...) to revert when that RPC fails, then call syncNotchVisibility(next) in a separate try/catch so a notch-sync error is logged (console.error('[VoicePanel] failed to sync notch', err)) but does not revert the persisted always_on_enabled state; reference setSettings and syncNotchVisibility(next) when making the change and ensure distinct error messages for each failure.
🧹 Nitpick comments (2)
app/src/utils/tauriCommands/window.ts (1)
121-123: ⚡ Quick winRedact notch invoke errors before logging.
Both catch blocks log
errverbatim. Please log a redacted/stable message instead of the raw error object to avoid leaking sensitive context into frontend logs.As per coding guidelines: "Never log secrets or full PII—redact sensitive information in all logs."
Also applies to: 135-137
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/utils/tauriCommands/window.ts` around lines 121 - 123, The catch blocks in the window-related Tauri commands (the catch in the show function and the other catch around lines mentioned) currently log the raw error object (console.warn('[notch] show failed', err)), which can leak sensitive data; change them to log a stable, redact-safe message instead (e.g., console.warn('[notch] show failed: redacted error') or use a small helper like redactError(error) that returns a non-sensitive string) and update both catch sites to use that helper/message so no raw error objects are emitted to frontend logs.app/src/App.tsx (1)
209-211: ⚡ Quick winAvoid logging raw boot-sync errors.
The catch block logs
errdirectly. Please switch to a redacted/stable log message to avoid exposing sensitive payloads in frontend diagnostics.As per coding guidelines: "Never log secrets or full PII—redact sensitive information in all logs."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/App.tsx` around lines 209 - 211, The catch block currently logs the raw error object via console.debug('[notch] boot visibility sync failed', err); replace that with a stable/redacted message (e.g. console.debug('[notch] boot visibility sync failed - see diagnostics')) and do not append the raw err object; if you must record details for telemetry, capture only non-sensitive fields (e.g. err?.name or a sanitized err.code) and send them to a secure monitoring API instead of printing them to the console. Ensure the change is applied to the catch handling around the boot visibility sync where console.debug is called.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/voice/always_on.rs`:
- Around line 375-377: The current log call that prints the raw spoken command
(log::info!("{LOG_PREFIX} wake word matched → command={cmd:?}")) must be
replaced to avoid storing transcript-derived PII; instead log only metadata such
as intent kind, command length, and a success/processing flag. Update the
wake-word handling around the log::info call and any other usages in the same
block (also review lines ~399-412) to: avoid printing cmd, intent, msg, or raw
error text; emit sanitized fields (e.g., intent_kind, cmd_len = cmd.text.len(),
processing=true) and ensure deliver_command(config, cmd).await is called without
logging cmd contents; also remove or redact any raw error messages before
logging in error paths.
- Around line 480-489: The osa function currently awaits
tokio::process::Command::output() which can hang and block deliver_command; wrap
the .output().await in a tokio::time::timeout (e.g.,
tokio::time::Duration::from_millis/seconds) inside the #[cfg(target_os =
"macos")] block and map a timeout error to an Err result (with a clear message
like "osascript timed out") so deliver_command can continue to the fallback
path; keep the existing spawn error mapping but convert the timeout branch into
an early Err returned from osa so callers see the failure quickly.
In `@src/openhuman/voice/command_router.rs`:
- Around line 131-135: The play fast-path currently only rejects narrow pronouns
via is_pronoun() and should instead use the same ambiguity check as the Music
fast-path; update the condition in the play handling (the block using
clean_media_query and is_pronoun) to also apply the broader ambiguity predicate
used by the music fast-path (either call the existing helper in the music
fast-path module or inline the same checks from app_fastpaths::music.rs) so
queries like "play them", "play a song", and "play songs" are rejected and fall
through to Unknown → agent; make the same change to the other play-handling
block (the second occurrence around the other play prefix).
---
Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 516-524: The current try/catch groups the settings RPC and
syncNotchVisibility(next) so a notch-sync failure can wrongly trigger the
rollback via setSettings; split the operations: await the settings persistence
(the RPC that updates always_on_enabled) inside its own try/catch and only call
setSettings(...) to revert when that RPC fails, then call
syncNotchVisibility(next) in a separate try/catch so a notch-sync error is
logged (console.error('[VoicePanel] failed to sync notch', err)) but does not
revert the persisted always_on_enabled state; reference setSettings and
syncNotchVisibility(next) when making the change and ensure distinct error
messages for each failure.
---
Nitpick comments:
In `@app/src/App.tsx`:
- Around line 209-211: The catch block currently logs the raw error object via
console.debug('[notch] boot visibility sync failed', err); replace that with a
stable/redacted message (e.g. console.debug('[notch] boot visibility sync failed
- see diagnostics')) and do not append the raw err object; if you must record
details for telemetry, capture only non-sensitive fields (e.g. err?.name or a
sanitized err.code) and send them to a secure monitoring API instead of printing
them to the console. Ensure the change is applied to the catch handling around
the boot visibility sync where console.debug is called.
In `@app/src/utils/tauriCommands/window.ts`:
- Around line 121-123: The catch blocks in the window-related Tauri commands
(the catch in the show function and the other catch around lines mentioned)
currently log the raw error object (console.warn('[notch] show failed', err)),
which can leak sensitive data; change them to log a stable, redact-safe message
instead (e.g., console.warn('[notch] show failed: redacted error') or use a
small helper like redactError(error) that returns a non-sensitive string) and
update both catch sites to use that helper/message so no raw error objects are
emitted to frontend logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99ec057a-388a-4426-9318-3be0b03930b8
📒 Files selected for processing (11)
app/src-tauri/permissions/allow-core-process.tomlapp/src-tauri/src/lib.rsapp/src/App.tsxapp/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxapp/src/utils/tauriCommands/window.test.tsapp/src/utils/tauriCommands/window.tssrc/openhuman/agent_registry/agents/orchestrator/prompt.mdsrc/openhuman/voice/always_on.rssrc/openhuman/voice/command_router.rssrc/openhuman/voice/mod.rs
…+ cover notch boot-sync CodeRabbit tinyhumansai#3346: - always_on: log only intent kind + lengths in deliver_command (never the raw spoken command / query / app / summary / error — always-on mic PII). - always_on: wrap osascript in a 5s timeout so a hung fast-path still falls back to the agent. - command_router: is_pronoun now mirrors the Music fast-path set (them/a song/songs) so ambiguous 'play …' defers to the agent; add VoiceIntent::kind() for non-PII logging. Tests for both. - Extract App's boot notch-sync into useNotchBootSync hook + test (covers the previously-uncovered App.tsx lines → clears the diff-cover gate).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/hooks/useNotchBootSync.ts`:
- Line 24: Replace the console.debug call in useNotchBootSync with the app's
namespaced debug logger: import or obtain the same debug logger pattern used
across app/src (e.g. via the 'debug' package) and create a namespaced instance
(like 'app:notch:boot' or following the repo's existing namespace convention),
then call that logger with the message and error instead of console.debug;
update the single occurrence in function useNotchBootSync where
console.debug('[notch] boot visibility sync failed', err) appears and ensure the
import/initializer for the debug logger is added at the top of the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 78d42cd5-9a13-4d1b-b075-493b0bf5d1b2
📒 Files selected for processing (5)
app/src/App.tsxapp/src/hooks/__tests__/useNotchBootSync.test.tsxapp/src/hooks/useNotchBootSync.tssrc/openhuman/voice/always_on.rssrc/openhuman/voice/command_router.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/voice/always_on.rs
CodeRabbit: replace console.debug with the app's `debug` package
namespaced logger ('notch:boot'), per the app/src logging convention.
Re-apply the tinyhumansai#3346 reconciliation lost when taking slice-8's docs: the 'Fix required (not yet done) / keep disabled' paragraph contradicted the '✅ Crash fixed' status. Now past-tense root-cause-fixed (run_input_on_main on the main thread + catch_unwind); covers vision_click too. Tag the trace fence as text (MD040).
Summary
Slice 7/8 of #3307 — Phase 3 fast command router (intent classifier).
Closes #3148 (the original voice → system-action issue) once the full train lands.
Files (11)
voice/{command_router,always_on,mod}.rs,app/src/App.tsx,components/settings/panels/VoicePanel.tsx(+ test),utils/tauriCommands/window.ts(+ test),app/src-tauri/{permissions/allow-core-process.toml,src/lib.rs},docs/voice-system-actions.md.Summary by CodeRabbit
New Features
Documentation
Tests