feat(voice): voice-native approval — spoken TTS confirmation for sensitive actions (Phase 4 of #3148)#3368
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (24)
💤 Files with no reviewable changes (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (14)
📝 WalkthroughWalkthroughThis PR implements voice-initiated approval flows: when a user sends a voice-dictated message, the system threads a ChangesVoice-initiated approval and speech synthesis
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related Issues
Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…ions aloud (Phase 4)
Makes the existing ApprovalGate voice-native: when a sensitive agent action is
parked for approval during a voice-initiated turn, the assistant speaks the
confirmation prompt aloud so a hands-free / always-on user can answer "yes"/"no"
by voice. Typed-chat approvals stay visual-only (the in-app approval card).
- Origin flag: `voice: bool` on `ApprovalChatContext` (set from a new optional
`voice` param on the `channel_web_chat` RPC + `chat:start` socket event),
stamped onto `DomainEvent::ApprovalRequested { is_voice }` at publish time.
The frontend tags dictation / always-on auto-sends (`chatSend({ voice: true })`).
- New `voice/approval_surface.rs`: an EventHandler (mirrors the Telegram approval
surface) that, on `ApprovalRequested { is_voice: true }`, builds a short spoken
line and publishes it. Registered at startup beside the web/telegram surfaces.
- New `voice/speak_bus.rs`: `publish_speak`/`subscribe_speak_events` broadcast
(mirrors `overlay/bus.rs`); `core/socketio.rs` bridges it to a `voice:speak`
Socket.IO event.
- Frontend `useVoiceSpeak` (mounted app-wide) plays `voice:speak` via the
existing TTS pipeline, so the prompt is heard even without the mascot view.
- Spoken yes/no answer needs no new code — it rides the existing transcription →
auto-send → web.rs ingress yes/no → approval_decide path.
Tests: Rust — spoken_prompt formatter, the is_voice gate (speaks on true, silent
on false / non-approval / empty-summary), speak_bus round-trip. Frontend (Vitest)
— useVoiceSpeak synthesizes + plays on voice:speak, ignores empty/malformed
payloads, unsubscribes on unmount.
Phase 4 of tinyhumansai#3148. Follow-up: localize the server-built spoken suffix.
c00ceec to
349c44a
Compare
There was a problem hiding this comment.
Actionable comments posted: 15
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/config/ops_tests.rs (1)
1031-1061:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert the new voice fields in this test.
Line 1046 and Line 1047 set
always_on_enabled/wake_word, but the test never verifies they persisted. This leaves the new path effectively untested.Proposed test assertions
let outcome = load_and_apply_voice_server_settings(patch) .await .expect("ok"); + assert_eq!( + outcome.value["config"]["voice_server"]["always_on_enabled"], + serde_json::json!(true) + ); + assert_eq!( + outcome.value["config"]["voice_server"]["wake_word"], + serde_json::json!("Hey Tiny") + ); assert!( outcome.value["config"]["voice_server"]["min_duration_secs"] .as_f64() .unwrap_or(-1.0) >= 0.0 );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/ops_tests.rs` around lines 1031 - 1061, The test load_and_apply_voice_server_settings_accepts_valid_modes_and_clamps sets always_on_enabled and wake_word but never asserts they persisted; update the assertions after calling load_and_apply_voice_server_settings (using the outcome variable) to check outcome.value["config"]["voice_server"]["always_on_enabled"] is true (use as_bool()/unwrap_or(false)) and outcome.value["config"]["voice_server"]["wake_word"] matches "Hey Tiny" (use as_str()/unwrap_or("") or equivalent). Ensure these checks follow the existing min_duration_secs assertion and run before removing the OPENHUMAN_WORKSPACE env var.
🟡 Minor comments (8)
app/src/lib/i18n/fr.ts-1459-1461 (1)
1459-1461:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep French pronoun register consistent in this new string.
Line 1461 mixes informal and formal forms (
Garde ... vous dites). Please keep a single register (this file mostly uses informaltu).✏️ Suggested wording
- 'voice.debug.alwaysOnDesc': - 'Garde le microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé.', + 'voice.debug.alwaysOnDesc': + 'Garde le microphone ouvert et envoie automatiquement ce que tu dis à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé.',🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/lib/i18n/fr.ts` around lines 1459 - 1461, The French string for the key 'voice.debug.alwaysOnDesc' mixes formal and informal pronouns; update the value to a consistent informal (tu) register by replacing the sentence "Garde le microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé." with an informal variant such as "Garde le micro ouvert et envoie automatiquement ce que tu dis à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé." to match the rest of the file.app/src/components/settings/panels/__tests__/VoicePanel.test.tsx-539-553 (1)
539-553:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd an assertion that the failing toggle path actually attempted the update RPC.
Right now this test can pass even if the click handler is broken, since
aria-checked="false"and “no notch sync calls” are both true before the click.✅ Tighten the failure-path assertion
fireEvent.click(toggle); - // The optimistic flip is reverted back to off after the RPC rejects, and - // the notch is never shown because the persist failed. + await waitFor(() => + expect(vi.mocked(openhumanUpdateVoiceServerSettings)).toHaveBeenCalledWith({ + always_on_enabled: true, + }) + ); + + // The toggle ends in off after rejection, and notch is never synced. await waitFor(() => expect(toggle).toHaveAttribute('aria-checked', 'false')); expect(vi.mocked(syncNotchVisibility)).not.toHaveBeenCalled();As per coding guidelines: "Prefer testing behavior over implementation in unit tests".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx` around lines 539 - 553, The test currently never asserts that the click actually triggered the RPC; add an assertion that the mocked update RPC (openhumanUpdateVoiceServerSettings) was invoked when the toggle was clicked — e.g. after the click and after awaiting the revert, assert vi.mocked(openhumanUpdateVoiceServerSettings).toHaveBeenCalled() or .toHaveBeenCalledTimes(1) (and optionally with the expected payload) so the failing-path truly exercised the RPC call in VoicePanel and you still assert syncNotchVisibility was not called.src/openhuman/config/schemas.rs-1731-1736 (1)
1731-1736:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t silently swallow live refresh failures after updating voice settings.
If config reload fails here, the RPC still succeeds but always-on runtime state may remain stale until restart. Please log (or return) the failure path.
Proposed fix
- if let Ok(config) = config_rpc::load_config_with_timeout().await { - crate::openhuman::voice::always_on::start_if_enabled(&config).await; - } + match config_rpc::load_config_with_timeout().await { + Ok(config) => crate::openhuman::voice::always_on::start_if_enabled(&config).await, + Err(err) => { + log::warn!( + "[config][rpc] update_voice_server_settings live refresh skipped: {err}" + ); + } + }As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/schemas.rs` around lines 1731 - 1736, The code calls config_rpc::load_config_with_timeout().await and silently ignores failures, which can leave always-on runtime stale; modify the block after config_rpc::load_and_apply_voice_server_settings(patch).await to handle the Err case: if load_config_with_timeout() returns Ok call crate::openhuman::voice::always_on::start_if_enabled(&config).await, otherwise log a stable, grep-friendly error (including the error details and context like "voice-settings: reload failed") via the existing logger or return the error so the caller can see the failure; ensure the log message includes the function names or context to aid debugging.src/openhuman/voice/command_router.rs-59-72 (1)
59-72:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
strip_fillerclaims to handle “hey” but currently does not.The docstring says “hey” is stripped, but it isn’t in
fillers, so commands like “hey pause” route toUnknown.Proposed fix
fn strip_filler(s: &str) -> &str { let fillers = [ + "hey ", "please ", "can you please ", "can you ", "could you please ", "could you ", "would you ", "i want to ", "i want you to ", "go ahead and ", ];fn transport_controls() { + assert_eq!(route("hey pause"), VoiceIntent::Pause); assert_eq!(route("pause"), VoiceIntent::Pause); assert_eq!(route("Pause the music."), VoiceIntent::Pause); assert_eq!(route("please stop"), VoiceIntent::Pause);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/command_router.rs` around lines 59 - 72, The docstring for strip_filler says it removes "hey" but the fillers array in strip_filler does not include "hey ", so inputs like "hey pause" aren't stripped; update the fillers list inside the strip_filler function to include "hey " (and any other expected filler variants you want handled) so the function actually strips that prefix when matching commands.docs/voice-automate-plan.md-20-20 (1)
20-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix fenced-code metadata and remove the trailing stray fence.
Line 20 opens an unlabeled code fence, and Line 152 appears to introduce an extra fence. This will keep markdownlint warning and can break rendering in some viewers.
Proposed patch
-``` +```text orchestrator (chat LLM) │ one call: automate{ app, goal } ▼ @@ -- **Safety** — `automate` is a mutating tool: same opt-in + `SENSITIVE_APPS` - denylist + ApprovalGate routing as `ax_interact`; the inner loop may not target a - denylisted app even if the model asks. -``` +- **Safety** — `automate` is a mutating tool: same opt-in + `SENSITIVE_APPS` + denylist + ApprovalGate routing as `ax_interact`; the inner loop may not target a + denylisted app even if the model asks.Also applies to: 152-152
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/voice-automate-plan.md` at line 20, The file contains an unlabeled opening code fence and a stray closing fence that breaks markdown rendering; update the opening fence to include a language label (e.g., change the opening ``` to ```text) and remove the extra trailing ``` after the Safety paragraph so the fenced block around the orchestrator / Safety section is properly opened and closed; locate the unlabeled fence near the start of the orchestration snippet and the stray fence after the "denylisted app even if the model asks." line and make the label/removal changes there.docs/voice-system-actions.md-365-365 (1)
365-365:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a language tag to the fenced code block.
This fence is missing a language identifier and can trigger markdown lint failures.
Suggested fix
-``` +```text enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty → dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP -``` +```🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/voice-system-actions.md` at line 365, The fenced code block containing the trace lines (enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty → dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP) is missing a language tag; update the fence to include a language identifier (for example change ``` to ```text) so the block becomes ```text ... ```; locate the block by the presence of those unique symbols and add the language tag only, leaving the content unchanged.docs/voice-system-actions.md-329-333 (1)
329-333:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve conflicting rollout/status labels in the same tracker.
These changed sections currently conflict (for example: “Not Started” vs “M1+M2+M3 shipped/proven live”, and “crash fixed” vs “fix required / keep disabled”). Please normalize to one authoritative state so the tracker is operationally reliable.
Based on learnings: New or changed behavior must ship with matching documentation—at minimum add concise rustdoc / code comments where flow is not obvious, and update
AGENTS.md, architecture docs, or feature docs when repository rules or user-visible behavior change.Also applies to: 352-372, 637-644
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/voice-system-actions.md` around lines 329 - 333, The tracker sections for "Phase 1.5 — Reliable, real-time multi-step automation" contain conflicting rollout/status labels (e.g., "Not Started" vs "M1+M2+M3 shipped/proven live" and "crash fixed" vs "fix required / keep disabled"); normalize each affected block (including the referenced spans ~329-333, 352-372, 637-644) to a single authoritative status string and a single canonical rollout note, update the human-facing lines under the phase header and any bullet status lines to match that single state, and add a short note pointing to implementation docs; also add matching inline rustdoc/code comments where flow is non-obvious in the implementation (reference the "Rust inner loop" and "voice-automate-plan.md") and update AGENTS.md and relevant architecture/feature docs so repository rules and user-visible behavior reflect the chosen state.src/openhuman/accessibility/automate.rs-203-206 (1)
203-206:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winExtract the first balanced JSON object, not first
{to last}.Lines 203-206 currently slice from the first
{to the last}, which can fail when output contains extra brace blocks after a valid object. This contradicts the intended “first balanced block” behavior and causes avoidable parse failures.🔧 Suggested parser fix
- // Extract the first {...} span and retry. - if let (Some(start), Some(end)) = (trimmed.find('{'), trimmed.rfind('}')) { - if end > start { - if let Ok(a) = serde_json::from_str::<Action>(&trimmed[start..=end]) { - return Ok(a); - } - } - } + // Extract the first balanced {...} block and retry. + if let Some(start) = trimmed.find('{') { + let mut depth = 0usize; + for (idx, ch) in trimmed[start..].char_indices() { + match ch { + '{' => depth += 1, + '}' => { + depth -= 1; + if depth == 0 { + let end = start + idx; + if let Ok(a) = serde_json::from_str::<Action>(&trimmed[start..=end]) { + return Ok(a); + } + break; + } + } + _ => {} + } + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/automate.rs` around lines 203 - 206, The current logic slices trimmed from the first '{' to the last '}', which can include extra trailing brace blocks and fail parsing; change this to locate the first balanced JSON object by scanning trimmed from the first '{', tracking brace depth (increment on '{', decrement on '}') until depth returns to zero, taking that slice as the candidate JSON, then call serde_json::from_str::<Action>(&trimmed[start..=end]) on that balanced slice (ensure the scanner handles braces inside strings/escaped quotes or at least stops at matching braces for typical outputs); update the block around the trimmed.find/rfind usage to use this balanced-scan approach instead of using rfind('}').
🧹 Nitpick comments (6)
app/src-tauri/permissions/allow-core-process.toml (1)
65-70: 🏗️ Heavy liftSplit notch window commands into a dedicated permission scope.
Adding
notch_window_show/notch_window_hidetoallow-core-processfurther broadens a privileged bundle. Please move these into a dedicated permission TOML and grant it only to the notch-related UI surface(s).Based on learnings: "When reviewing Tauri permission changes, avoid keeping unrelated permissions bundled into a broad identifier ... split them into dedicated, narrower permission TOML identifiers and update capability grants."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src-tauri/permissions/allow-core-process.toml` around lines 65 - 70, The allow-core-process TOML currently lists the privileged commands "notch_window_show" and "notch_window_hide"; remove those two entries from allow-core-process and create a new, dedicated permission TOML (e.g., notch-window-permissions) that contains "notch_window_show" and "notch_window_hide", then update the capability grants so only the notch-related UI surfaces/components are given that new permission identifier instead of the broad allow-core-process. Ensure any references to these commands (invokes) still resolve by updating the permission identifier in the relevant UI surface capability configurations.app/src/features/human/voice/useVoiceSpeak.test.ts (1)
65-69: ⚡ Quick winExercise cleanup after a real
voice:speakevent.This only verifies
socketService.off(...). It never starts playback, so it won't fail if unmount forgets to call the returnedstop()handle and leaves audio running.🧪 Suggested test adjustment
-it('unsubscribes and stops playback on unmount', () => { +it('unsubscribes and stops playback on unmount', async () => { const { unmount } = renderHook(() => useVoiceSpeak()); + speakHandler()({ text: 'Post to Slack. Say yes to confirm.', source: 'approval' }); + await vi.waitFor(() => expect(hoisted.playMock).toHaveBeenCalledTimes(1)); unmount(); expect(hoisted.offMock).toHaveBeenCalledWith('voice:speak', expect.any(Function)); + expect(hoisted.stopMock).toHaveBeenCalledTimes(1); });Based on learnings: Prefer testing behavior over implementation details in unit tests. Use existing helpers from
app/src/test/before adding new harness code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/voice/useVoiceSpeak.test.ts` around lines 65 - 69, The test only asserts that socketService.off was called but never triggers a real "voice:speak" playback, so update the test in useVoiceSpeak.test.ts to simulate a full lifecycle: use the same socket test helper (hoisted.emitMock or the test helper in app/src/test/) to emit a 'voice:speak' event after rendering useVoiceSpeak (so playback actually starts), capture or spy the playback stop handle returned by the playback/start path (e.g., mock the audio/playback object's stop method or capture the stop function returned by the handler), then call unmount() and assert both that hoisted.offMock was called with 'voice:speak' and that the captured stop handle was invoked; reference useVoiceSpeak, hoisted.emitMock (or the project's socket emit helper), hoisted.offMock, and the playback stop mock when locating where to change the test.src/openhuman/voice/audio_capture.rs (1)
105-129: ⚡ Quick winRoute the one-shot finalize path through this helper too.
The new helper says it is shared by the recorder finalize path, but
finalize_recording()still has its own WAV encoding block below. That leaves two PCM/WAV implementations to keep in sync and weakens the reuse this change is trying to establish.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/audio_capture.rs` around lines 105 - 129, The finalize_recording path still contains an inline WAV encoding loop duplicated from encode_wav_16k; replace that block in finalize_recording() with a call to encode_wav_16k(samples_16k_slice) (passing the prepared 16k mono f32 slice) and use the returned Vec<u8> as the WAV bytes, removing the duplicated WavWriter/loop/finalize code; make sure to propagate and map errors to the same Result<String> error style as encode_wav_16k, and keep sample-rate/resampling logic intact so finalize_recording supplies a proper &[f32] to encode_wav_16k.app/src/notch/NotchApp.tsx (1)
183-207: ⚡ Quick winUse namespaced debug logging instead of
console.*inapp/src.This file currently emits diagnostics via
console.debug/console.warn; switch to the repo’s namespaced debug logger pattern with dev-only detail.As per coding guidelines: "Use namespaced
debugfunction with dev-only detail for diagnostic logging in TypeScript/React".Also applies to: 234-255
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/notch/NotchApp.tsx` around lines 183 - 207, Replace console.debug/console.warn calls inside the socket handlers (e.g., the handlers registered for 'dictation:toggle', 'dictation:transcription', and 'companion:state_changed' in NotchApp/NotchApp.tsx) with the project’s namespaced debug logger (use the existing debug utility used elsewhere in app/src, e.g., debug('notch:...') or the repo’s debug import) so logs are namespaced and can be stripped in production; import the debug function if not present and convert messages like console.debug(`[notch] ...`) to debug('notch', ...) (or the repo convention) and ensure any verbose/detail-only messages are guarded as dev-only per the repo pattern (the other console.* occurrences around lines ~234-255 should be changed the same way).app/src/pages/Conversations.tsx (1)
793-793: ⚡ Quick winPreserve optional wire behavior for
voiceon non-voice turns.This currently forces
voice: falsefor every send. Passing throughopts?.voicekeeps the field truly optional and avoids sending it when not explicitly voice-initiated.Suggested diff
- voice: opts?.voice ?? false, + voice: opts?.voice,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/pages/Conversations.tsx` at line 793, The current payload always sets the voice field to false by using "voice: opts?.voice ?? false"; change this to preserve optional behavior by sending "voice: opts?.voice" instead so the field is omitted when undefined (locate the object construction using opts?.voice ?? false in Conversations.tsx, likely inside the send/submit function that builds the message payload). Ensure no other code relies on voice being explicitly boolean false when omitted.src/openhuman/voice/speak_bus.rs (1)
47-55: 🏗️ Heavy liftRoute speech events through the typed global event bus instead of a module-local broadcast.
This path is now cross-module (
voice→core/socketio), but it bypasses the repository’s standardDomainEventpub/sub surface. Please move this toevent_bus::publish_global/subscribe_globalwith a dedicatedDomainEventvariant so registration, tracing, and fanout behavior stay consistent with other domain events.As per coding guidelines:
src/**/*.rsshould use the event bus for decoupled cross-module communication viapublish_global/subscribe_global.Also applies to: 62-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/speak_bus.rs` around lines 47 - 55, Replace the module-local broadcast (SPEAK_BUS) and subscribe_speak_events() to use the repository event bus: add a new DomainEvent variant (e.g., DomainEvent::SpeakRequest(SpeakRequest)) and publish speak events via event_bus::publish_global(DomainEvent::SpeakRequest(...)) where currently sending to SPEAK_BUS; implement subscribe_speak_events() to call event_bus::subscribe_global() and filter/unwrap DomainEvent::SpeakRequest into a broadcast::Receiver/stream of SpeakRequest so consumers (Socket.IO bridge) receive typed events through the global bus rather than the SPEAK_BUS static.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/notch_window.rs`:
- Around line 362-366: The string interpolation in the format! that builds js in
notch_window.rs (the js variable where token and base_url are inserted) is
unsafe: escape token and base_url for inclusion in a single-quoted JS string (or
serialize them as JS string literals) before formatting to prevent quotes,
backslashes or newlines from breaking or injecting code; add a small helper
(e.g., js_string_escape or use serde_json::to_string to produce a quoted JS
string and strip surrounding quotes) and use that helper to produce safe values
for token and base_url when constructing the js variable and dispatching the
CustomEvent.
In `@app/src/App.tsx`:
- Around line 205-214: The notch sync one-shot guard is set before the async
work, causing failures to permanently skip retries; modify the IIFE so
notchSyncedRef.current is assigned true only after a successful sync: call
openhumanGetVoiceServerSettings() and await syncNotchVisibility(...) inside the
try block, and on success set notchSyncedRef.current = true (keep the early
return check for isBootstrapping || notchSyncedRef.current at the top) so
transient errors won’t flip the guard; reference functions/vars:
isBootstrapping, notchSyncedRef, openhumanGetVoiceServerSettings,
syncNotchVisibility.
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 509-525: The toggle handler mixes persistence and notch-sync and
is vulnerable to re-entrant clicks; change it so clicking is ignored while a
pendingToggle flag (or similar) is true, update the UI optimistically and
persist with openhumanUpdateVoiceServerSettings, and only revert the UI if
persistence fails—do not revert when syncNotchVisibility fails; instead handle
sync errors separately (log/show feedback) and optionally retry without changing
settings. Use the existing symbols settings.always_on_enabled, setSettings,
openhumanUpdateVoiceServerSettings, and syncNotchVisibility and add a local
pending toggle guard to prevent concurrent toggles.
In `@app/src/features/human/voice/useVoiceSpeak.ts`:
- Around line 50-60: The synth/ playback race happens because older
synthesizeSpeech calls can resolve later and call handleRef.current?.stop(),
killing newer audio; fix by introducing a monotonic request id (e.g.,
requestIdRef) and capture const myRequestId = ++requestIdRef.current at the
start of the async block, then after synthesizeSpeech resolves verify
myRequestId === requestIdRef.current before stopping the current handle or
assigning handleRef.current = handle (and similarly before calling
handle.ended.catch). This ensures only the latest voice:speak invocation can
stop or replace the shared handle; reference synthesizeSpeech, handleRef,
playBase64Audio, and handle.ended in your change.
In `@app/src/notch/NotchApp.tsx`:
- Around line 164-261: connectSocket can initialize a socket after the component
unmounts because the returned disposable isn't invoked; ensure teardown is
actually used and prevent post-unmount assignment/leaks by (1) keeping the
disposed flag check after await connectCoreSocket(...) and if disposed is true
then immediately disconnect the newly created socket and return, (2) register
socket listeners on the socket instance only when not disposed, and (3) ensure
the caller effect that invokes connectSocket actually captures and calls its
returned cleanup function (i.e., call the function returned by connectSocket
inside the effect's cleanup). Refer to connectSocket, connectCoreSocket,
socketRef, and the local disposed flag when making these changes.
In `@app/src/utils/tauriCommands/window.ts`:
- Around line 122-123: Replace the raw console.warn(err) calls in
app/src/utils/tauriCommands/window.ts with a namespaced debug logger and
redacted error text: import and instantiate a debug logger (e.g. const debug =
createDebug('app:tauri:window') or using the repo's existing debug factory),
then change the console.warn in the function(s) that show the window (and the
other occurrences around lines 136-137) to debug(...) while only logging a
sanitized message and non-sensitive fields (e.g. debug('show failed:',
safeErrorString(err)) where safeErrorString returns err.message or a redacted
summary, not the full err object); ensure you add the debug
import/initialization and replace all similar console.warn/raw err logs in this
module to follow the same pattern.
In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Line 237: The current log call logs raw user query via
log::info!("[automate::music] ▶ play query={query:?}"); — replace that to avoid
PII by computing and logging only non-sensitive metadata: compute query_len =
query.len() and query_hash = sha256_hex(query) (or another stable hash) and
update the log to something like log::info!("[automate::music] ▶ play
query_len={} query_hash={}", query_len, query_hash); ensure the original `query`
string is not included anywhere in logs and reference the same variable name
`query` and the logging site (log::info! call) when making the change.
- Around line 150-166: The replace_ci function is using byte-indexing into the
lowercased string (hl[i..]) while i is tracked over the original haystack's
bytes, which is Unicode-unsafe; change replace_ci to iterate using char_indices
(or parallel char_indices over haystack and hl) and track byte offsets for both
original and lowercased forms so you only slice at valid UTF-8 boundaries (e.g.,
use for (orig_byte_idx, ch) in haystack.char_indices() and maintain a
corresponding hl_byte_idx into hl to call hl[hl_byte_idx..].starts_with(&nl) and
advance both indices by the matched
needle.chars().map(|c|c.len_utf8()).sum::<usize>() or by the matched byte
lengths); and in run, stop logging the full user-derived query via
log::info!("[automate::music] ▶ play query={query:?}"), instead log a redacted
or masked version (e.g., omit the query, log only length or a fixed placeholder)
to avoid exposing PII/secrets.
In `@src/openhuman/accessibility/automate.rs`:
- Around line 210-212: The logs currently emit raw user/model content (e.g.,
variables goal, label, filter and trimmed/model output) which may contain PII;
update the logging and error creation to redact or omit sensitive content
instead of printing the full values — replace direct interpolations like
trimmed, goal, label, filter with a sanitized/redacted representation (e.g.,
call a helper redact_sensitive(s: &str) or replace with "[REDACTED]") in the
Err(format!(...)) and all log calls that reference those symbols so logs never
contain full user/model text.
In `@src/openhuman/credentials/ops.rs`:
- Around line 44-47: The always-on voice service started by
crate::openhuman::voice::always_on::start_if_enabled(config) isn't torn down on
logout; add a symmetric shutdown path by implementing a public stop() in
src/openhuman/voice/always_on.rs that clears ENABLED (Ordering::SeqCst) and
performs full stream/thread teardown, then call
crate::openhuman::voice::always_on::stop() from stop_login_gated_services so the
always-on capture is fully disabled on logout/user switch.
In `@src/openhuman/tools/impl/browser/screenshot.rs`:
- Around line 209-225: The helper downscale_to_jpeg currently returns the last
oversized candidate stored in last when no thumbnail fits max_bytes; change this
so that if the loop finishes without finding out.len() <= max_bytes it returns
an Err (e.g., via last.ok_or_else -> Err) instead of returning Ok(last),
preserving the function's contract; update the tail of downscale_to_jpeg
(referencing variables last and max_bytes and the function name) to produce an
error when no size fits rather than returning the oversized JPEG.
In `@src/openhuman/tools/impl/computer/automate.rs`:
- Around line 1-16: The AutomateTool implementation belongs in the domain's
owning module not in the impl tree: move the AutomateTool type, its Tool impl,
and any helper functions/uses (e.g., AutomateOptions, RealBackend,
is_sensitive_app) out of the legacy impl file and into the owning domain's
tools.rs (or a tools/ submodule) for the computer/openhuman domain, update the
module imports there, and then re-export AutomateTool from the central tools mod
(src/openhuman/tools/mod.rs) so external callers use the normal tool surface;
also remove the old file or its module entry from the impl/ folder and fix any
use sites to import the re-exported AutomateTool.
- Line 105: The current log statement uses log::info! and includes the raw
free-form variable goal which may contain sensitive user data; change the log to
avoid printing goal directly and instead record only derived/redacted metadata
(for example keep app as-is and replace goal with goal_len and/or a short hash),
e.g. compute a length (goal.len()) and a stable hash of goal (SHA-256 or
similar) and log those fields instead of the full goal string so
log::info!("[automate] ▶ tool execute app={app:?} goal_len={goal_len}
goal_hash={goal_hash:?}");
In `@src/openhuman/voice/always_on.rs`:
- Around line 229-230: The always-on processor currently captures a static
AppConfig via app_config.clone and builds VadConfig with
VadConfig::from_server_config at startup, causing updates (wake word, STT
provider, VAD tuning) to be ignored while RUNNING is true; change this to read a
shared, mutable runtime settings object instead of cloning at startup: introduce
or use an existing shared RwLock/atomic voice settings struct (e.g.,
get_runtime_voice_settings()) and replace direct uses of app_config /
VadConfig::from_server_config in the long‑lived loop and at each decision point
(locations around where app_config.clone is used and the VadConfig creation) so
you fetch updated settings per utterance/decision and rebuild or reference the
latest VadConfig/STT/wake-word provider on demand.
- Around line 365-375: The logs currently print full spoken content via
log::info! including cmd and text (see the log lines around LOG_PREFIX and the
deliver_command call), which may expose PII/secrets; update those log statements
to redact or mask sensitive content instead of printing cmd or text (e.g.,
replace with a redacted placeholder, truncated summary, or a hashed token) and
only log safe metadata (like whether a wake word matched, confidence flags, or a
masked length). Apply the change to both the wake-word-matched branch (before
calling deliver_command) and the None branch that logs transcript, or factor out
a helper (e.g., redact_transcript or mask_sensitive) to centralize redaction so
future logs use the safe representation. Ensure deliver_command invocation
remains unchanged except for not logging cmd contents directly in the
surrounding info messages.
---
Outside diff comments:
In `@src/openhuman/config/ops_tests.rs`:
- Around line 1031-1061: The test
load_and_apply_voice_server_settings_accepts_valid_modes_and_clamps sets
always_on_enabled and wake_word but never asserts they persisted; update the
assertions after calling load_and_apply_voice_server_settings (using the outcome
variable) to check outcome.value["config"]["voice_server"]["always_on_enabled"]
is true (use as_bool()/unwrap_or(false)) and
outcome.value["config"]["voice_server"]["wake_word"] matches "Hey Tiny" (use
as_str()/unwrap_or("") or equivalent). Ensure these checks follow the existing
min_duration_secs assertion and run before removing the OPENHUMAN_WORKSPACE env
var.
---
Minor comments:
In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`:
- Around line 539-553: The test currently never asserts that the click actually
triggered the RPC; add an assertion that the mocked update RPC
(openhumanUpdateVoiceServerSettings) was invoked when the toggle was clicked —
e.g. after the click and after awaiting the revert, assert
vi.mocked(openhumanUpdateVoiceServerSettings).toHaveBeenCalled() or
.toHaveBeenCalledTimes(1) (and optionally with the expected payload) so the
failing-path truly exercised the RPC call in VoicePanel and you still assert
syncNotchVisibility was not called.
In `@app/src/lib/i18n/fr.ts`:
- Around line 1459-1461: The French string for the key
'voice.debug.alwaysOnDesc' mixes formal and informal pronouns; update the value
to a consistent informal (tu) register by replacing the sentence "Garde le
microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans
raccourci. Se met en pause lorsque l’écran est verrouillé." with an informal
variant such as "Garde le micro ouvert et envoie automatiquement ce que tu dis à
l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé." to
match the rest of the file.
In `@docs/voice-automate-plan.md`:
- Line 20: The file contains an unlabeled opening code fence and a stray closing
fence that breaks markdown rendering; update the opening fence to include a
language label (e.g., change the opening ``` to ```text) and remove the extra
trailing ``` after the Safety paragraph so the fenced block around the
orchestrator / Safety section is properly opened and closed; locate the
unlabeled fence near the start of the orchestration snippet and the stray fence
after the "denylisted app even if the model asks." line and make the
label/removal changes there.
In `@docs/voice-system-actions.md`:
- Line 365: The fenced code block containing the trace lines
(enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty →
dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP) is missing a
language tag; update the fence to include a language identifier (for example
change ``` to ```text) so the block becomes ```text ... ```; locate the block by
the presence of those unique symbols and add the language tag only, leaving the
content unchanged.
- Around line 329-333: The tracker sections for "Phase 1.5 — Reliable, real-time
multi-step automation" contain conflicting rollout/status labels (e.g., "Not
Started" vs "M1+M2+M3 shipped/proven live" and "crash fixed" vs "fix required /
keep disabled"); normalize each affected block (including the referenced spans
~329-333, 352-372, 637-644) to a single authoritative status string and a single
canonical rollout note, update the human-facing lines under the phase header and
any bullet status lines to match that single state, and add a short note
pointing to implementation docs; also add matching inline rustdoc/code comments
where flow is non-obvious in the implementation (reference the "Rust inner loop"
and "voice-automate-plan.md") and update AGENTS.md and relevant
architecture/feature docs so repository rules and user-visible behavior reflect
the chosen state.
In `@src/openhuman/accessibility/automate.rs`:
- Around line 203-206: The current logic slices trimmed from the first '{' to
the last '}', which can include extra trailing brace blocks and fail parsing;
change this to locate the first balanced JSON object by scanning trimmed from
the first '{', tracking brace depth (increment on '{', decrement on '}') until
depth returns to zero, taking that slice as the candidate JSON, then call
serde_json::from_str::<Action>(&trimmed[start..=end]) on that balanced slice
(ensure the scanner handles braces inside strings/escaped quotes or at least
stops at matching braces for typical outputs); update the block around the
trimmed.find/rfind usage to use this balanced-scan approach instead of using
rfind('}').
In `@src/openhuman/config/schemas.rs`:
- Around line 1731-1736: The code calls
config_rpc::load_config_with_timeout().await and silently ignores failures,
which can leave always-on runtime stale; modify the block after
config_rpc::load_and_apply_voice_server_settings(patch).await to handle the Err
case: if load_config_with_timeout() returns Ok call
crate::openhuman::voice::always_on::start_if_enabled(&config).await, otherwise
log a stable, grep-friendly error (including the error details and context like
"voice-settings: reload failed") via the existing logger or return the error so
the caller can see the failure; ensure the log message includes the function
names or context to aid debugging.
In `@src/openhuman/voice/command_router.rs`:
- Around line 59-72: The docstring for strip_filler says it removes "hey" but
the fillers array in strip_filler does not include "hey ", so inputs like "hey
pause" aren't stripped; update the fillers list inside the strip_filler function
to include "hey " (and any other expected filler variants you want handled) so
the function actually strips that prefix when matching commands.
---
Nitpick comments:
In `@app/src-tauri/permissions/allow-core-process.toml`:
- Around line 65-70: The allow-core-process TOML currently lists the privileged
commands "notch_window_show" and "notch_window_hide"; remove those two entries
from allow-core-process and create a new, dedicated permission TOML (e.g.,
notch-window-permissions) that contains "notch_window_show" and
"notch_window_hide", then update the capability grants so only the notch-related
UI surfaces/components are given that new permission identifier instead of the
broad allow-core-process. Ensure any references to these commands (invokes)
still resolve by updating the permission identifier in the relevant UI surface
capability configurations.
In `@app/src/features/human/voice/useVoiceSpeak.test.ts`:
- Around line 65-69: The test only asserts that socketService.off was called but
never triggers a real "voice:speak" playback, so update the test in
useVoiceSpeak.test.ts to simulate a full lifecycle: use the same socket test
helper (hoisted.emitMock or the test helper in app/src/test/) to emit a
'voice:speak' event after rendering useVoiceSpeak (so playback actually starts),
capture or spy the playback stop handle returned by the playback/start path
(e.g., mock the audio/playback object's stop method or capture the stop function
returned by the handler), then call unmount() and assert both that
hoisted.offMock was called with 'voice:speak' and that the captured stop handle
was invoked; reference useVoiceSpeak, hoisted.emitMock (or the project's socket
emit helper), hoisted.offMock, and the playback stop mock when locating where to
change the test.
In `@app/src/notch/NotchApp.tsx`:
- Around line 183-207: Replace console.debug/console.warn calls inside the
socket handlers (e.g., the handlers registered for 'dictation:toggle',
'dictation:transcription', and 'companion:state_changed' in
NotchApp/NotchApp.tsx) with the project’s namespaced debug logger (use the
existing debug utility used elsewhere in app/src, e.g., debug('notch:...') or
the repo’s debug import) so logs are namespaced and can be stripped in
production; import the debug function if not present and convert messages like
console.debug(`[notch] ...`) to debug('notch', ...) (or the repo convention) and
ensure any verbose/detail-only messages are guarded as dev-only per the repo
pattern (the other console.* occurrences around lines ~234-255 should be changed
the same way).
In `@app/src/pages/Conversations.tsx`:
- Line 793: The current payload always sets the voice field to false by using
"voice: opts?.voice ?? false"; change this to preserve optional behavior by
sending "voice: opts?.voice" instead so the field is omitted when undefined
(locate the object construction using opts?.voice ?? false in Conversations.tsx,
likely inside the send/submit function that builds the message payload). Ensure
no other code relies on voice being explicitly boolean false when omitted.
In `@src/openhuman/voice/audio_capture.rs`:
- Around line 105-129: The finalize_recording path still contains an inline WAV
encoding loop duplicated from encode_wav_16k; replace that block in
finalize_recording() with a call to encode_wav_16k(samples_16k_slice) (passing
the prepared 16k mono f32 slice) and use the returned Vec<u8> as the WAV bytes,
removing the duplicated WavWriter/loop/finalize code; make sure to propagate and
map errors to the same Result<String> error style as encode_wav_16k, and keep
sample-rate/resampling logic intact so finalize_recording supplies a proper
&[f32] to encode_wav_16k.
In `@src/openhuman/voice/speak_bus.rs`:
- Around line 47-55: Replace the module-local broadcast (SPEAK_BUS) and
subscribe_speak_events() to use the repository event bus: add a new DomainEvent
variant (e.g., DomainEvent::SpeakRequest(SpeakRequest)) and publish speak events
via event_bus::publish_global(DomainEvent::SpeakRequest(...)) where currently
sending to SPEAK_BUS; implement subscribe_speak_events() to call
event_bus::subscribe_global() and filter/unwrap DomainEvent::SpeakRequest into a
broadcast::Receiver/stream of SpeakRequest so consumers (Socket.IO bridge)
receive typed events through the global bus rather than the SPEAK_BUS static.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d69e6a4-a381-469d-9aa2-969089cddf36
📒 Files selected for processing (97)
app/src-tauri/permissions/allow-core-process.tomlapp/src-tauri/src/lib.rsapp/src-tauri/src/notch_window.rsapp/src/App.tsxapp/src/components/settings/panels/VoiceDebugPanel.tsxapp/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxapp/src/features/human/voice/useVoiceSpeak.test.tsapp/src/features/human/voice/useVoiceSpeak.tsapp/src/index.cssapp/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/en.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tsapp/src/main.tsxapp/src/notch/NotchApp.tsxapp/src/pages/Conversations.tsxapp/src/services/chatService.tsapp/src/services/coreRpcClient.tsapp/src/utils/tauriCommands/voice.tsapp/src/utils/tauriCommands/window.test.tsapp/src/utils/tauriCommands/window.tsapp/src/utils/toolDefinitions.tsdocs/voice-automate-plan.mddocs/voice-system-actions.mdgitbooks/developing/architecture/tauri-shell.mdsrc/core/event_bus/events.rssrc/core/event_bus/events_tests.rssrc/core/jsonrpc.rssrc/core/socketio.rssrc/openhuman/accessibility/app_fastpaths/fastpaths_tests.rssrc/openhuman/accessibility/app_fastpaths/mod.rssrc/openhuman/accessibility/app_fastpaths/music.rssrc/openhuman/accessibility/automate.rssrc/openhuman/accessibility/automate_tests.rssrc/openhuman/accessibility/ax_interact.rssrc/openhuman/accessibility/ax_interact_tests.rssrc/openhuman/accessibility/helper.rssrc/openhuman/accessibility/mod.rssrc/openhuman/accessibility/uia_interact.rssrc/openhuman/agent/harness/tool_loop_tests.rssrc/openhuman/agent_registry/agents/orchestrator/agent.tomlsrc/openhuman/agent_registry/agents/orchestrator/prompt.mdsrc/openhuman/approval/gate.rssrc/openhuman/channels/bus.rssrc/openhuman/channels/providers/telegram/approval_surface_tests.rssrc/openhuman/channels/providers/web.rssrc/openhuman/channels/providers/web_tests.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/channels/runtime/startup.rssrc/openhuman/config/ops.rssrc/openhuman/config/ops_tests.rssrc/openhuman/config/schema/voice_server.rssrc/openhuman/config/schemas.rssrc/openhuman/credentials/ops.rssrc/openhuman/tools/impl/browser/screenshot.rssrc/openhuman/tools/impl/computer/automate.rssrc/openhuman/tools/impl/computer/ax_interact.rssrc/openhuman/tools/impl/computer/keyboard.rssrc/openhuman/tools/impl/computer/main_thread.rssrc/openhuman/tools/impl/computer/mod.rssrc/openhuman/tools/impl/computer/mouse.rssrc/openhuman/tools/impl/system/install_tool.rssrc/openhuman/tools/impl/system/launch_app.rssrc/openhuman/tools/impl/system/mod.rssrc/openhuman/tools/ops.rssrc/openhuman/tools/user_filter.rssrc/openhuman/voice/always_on.rssrc/openhuman/voice/approval_surface.rssrc/openhuman/voice/approval_surface_tests.rssrc/openhuman/voice/audio_capture.rssrc/openhuman/voice/command_router.rssrc/openhuman/voice/mod.rssrc/openhuman/voice/speak_bus.rssrc/openhuman/wallet/execution_tests.rstests/channels_large_round25_raw_coverage_e2e.rstests/channels_provider_deep_raw_coverage_e2e.rstests/channels_provider_leftovers_raw_coverage_e2e.rstests/channels_runtime_raw_coverage_e2e.rstests/channels_web_startup_raw_coverage_e2e.rstests/channels_web_telegram_raw_coverage_e2e.rstests/channels_web_yuanbao_round22_raw_coverage_e2e.rstests/json_rpc_e2e.rstests/tool_registry_approval_raw_coverage_e2e.rstests/tools_approval_channels_raw_coverage_e2e.rstests/tools_network_channels_raw_coverage_e2e.rstests/worker_b_raw_coverage_e2e.rs
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
🛑 Comments failed to post (15)
app/src-tauri/src/notch_window.rs (1)
362-366:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winEscape injected JS values before writing token/URL into the WebView.
tokenandbase_urlare interpolated directly into JS string literals. If either contains quote/backslash/newline characters, injection can fail or execute unintended JS.Proposed fix
- let js = format!( - "window.__OPENHUMAN_NOTCH_CORE_TOKEN__='{token}';\ - window.__OPENHUMAN_NOTCH_CORE_URL__='{base_url}';\ - window.dispatchEvent(new CustomEvent('notch:core-url',{{detail:{{url:'{base_url}'}}}}));" - ); + let token_js = serde_json::to_string(&token).unwrap_or_else(|_| "\"\"".to_string()); + let base_url_js = serde_json::to_string(&base_url).unwrap_or_else(|_| "\"\"".to_string()); + let js = format!( + "window.__OPENHUMAN_NOTCH_CORE_TOKEN__={token_js};\ + window.__OPENHUMAN_NOTCH_CORE_URL__={base_url_js};\ + window.dispatchEvent(new CustomEvent('notch:core-url',{{detail:{{url:{base_url_js}}}}}));" + );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src-tauri/src/notch_window.rs` around lines 362 - 366, The string interpolation in the format! that builds js in notch_window.rs (the js variable where token and base_url are inserted) is unsafe: escape token and base_url for inclusion in a single-quoted JS string (or serialize them as JS string literals) before formatting to prevent quotes, backslashes or newlines from breaking or injecting code; add a small helper (e.g., js_string_escape or use serde_json::to_string to produce a quoted JS string and strip surrounding quotes) and use that helper to produce safe values for token and base_url when constructing the js variable and dispatching the CustomEvent.app/src/App.tsx (1)
205-214:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet the one-shot guard only after a successful notch sync.
The current flow flips the guard before async work. A single transient failure permanently skips sync for this boot session, leaving notch visibility stale.
💡 Suggested fix
const notchSyncedRef = useRef(false); useEffect(() => { if (isBootstrapping || notchSyncedRef.current) return; - notchSyncedRef.current = true; - void (async () => { + let cancelled = false; + const run = async (attempt = 0): Promise<void> => { try { const res = await openhumanGetVoiceServerSettings(); await syncNotchVisibility(res.result.always_on_enabled); + notchSyncedRef.current = true; } catch (err) { - console.debug('[notch] boot visibility sync failed', err); + if (!cancelled && attempt < 2) { + window.setTimeout(() => void run(attempt + 1), 500 * (attempt + 1)); + return; + } + console.debug('[notch] boot visibility sync failed', err); } - })(); + }; + void run(); + return () => { + cancelled = true; + }; }, [isBootstrapping]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if (isBootstrapping || notchSyncedRef.current) return; let cancelled = false; const run = async (attempt = 0): Promise<void> => { try { const res = await openhumanGetVoiceServerSettings(); await syncNotchVisibility(res.result.always_on_enabled); notchSyncedRef.current = true; } catch (err) { if (!cancelled && attempt < 2) { window.setTimeout(() => void run(attempt + 1), 500 * (attempt + 1)); return; } console.debug('[notch] boot visibility sync failed', err); } }; void run(); return () => { cancelled = true; };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/App.tsx` around lines 205 - 214, The notch sync one-shot guard is set before the async work, causing failures to permanently skip retries; modify the IIFE so notchSyncedRef.current is assigned true only after a successful sync: call openhumanGetVoiceServerSettings() and await syncNotchVisibility(...) inside the try block, and on success set notchSyncedRef.current = true (keep the early return check for isBootstrapping || notchSyncedRef.current at the top) so transient errors won’t flip the guard; reference functions/vars: isBootstrapping, notchSyncedRef, openhumanGetVoiceServerSettings, syncNotchVisibility.app/src/components/settings/panels/VoicePanel.tsx (1)
509-525:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winSeparate persistence from notch-sync failure handling and guard re-entrant toggles.
This handler currently treats
openhumanUpdateVoiceServerSettingsandsyncNotchVisibilityas one atomic step. If persistence succeeds but notch sync fails, the UI is reverted to the wrong value. Also, multiple rapid clicks can race and leave inconsistent state.💡 Suggested fix
+const [isUpdatingAlwaysOn, setIsUpdatingAlwaysOn] = useState(false); ... <button type="button" role="switch" + disabled={isUpdatingAlwaysOn} aria-checked={settings.always_on_enabled} aria-label={t('voice.debug.alwaysOn')} data-testid="voice-always-on-toggle" onClick={async () => { + if (isUpdatingAlwaysOn) return; + const prev = settings.always_on_enabled; const next = !settings.always_on_enabled; + setIsUpdatingAlwaysOn(true); setSettings(current => current ? { ...current, always_on_enabled: next } : current ); try { await openhumanUpdateVoiceServerSettings({ always_on_enabled: next }); - // The notch pill is the always-on listening HUD: show it - // when listening is enabled, drop it when disabled. - await syncNotchVisibility(next); } catch (err) { - // Revert on failure so the UI reflects the persisted value. setSettings(current => - current ? { ...current, always_on_enabled: !next } : current + current ? { ...current, always_on_enabled: prev } : current ); - console.error('[VoicePanel] failed to toggle always-on', err); + setError(err instanceof Error ? err.message : t('voice.providers.failedToSave')); + setIsUpdatingAlwaysOn(false); + return; } + setIsUpdatingAlwaysOn(false); + void syncNotchVisibility(next).catch(err => { + if (process.env.NODE_ENV !== 'production') { + console.debug('[VoicePanel] notch sync failed', err); + } + }); }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 509 - 525, The toggle handler mixes persistence and notch-sync and is vulnerable to re-entrant clicks; change it so clicking is ignored while a pendingToggle flag (or similar) is true, update the UI optimistically and persist with openhumanUpdateVoiceServerSettings, and only revert the UI if persistence fails—do not revert when syncNotchVisibility fails; instead handle sync errors separately (log/show feedback) and optionally retry without changing settings. Use the existing symbols settings.always_on_enabled, setSettings, openhumanUpdateVoiceServerSettings, and syncNotchVisibility and add a local pending toggle guard to prevent concurrent toggles.app/src/features/human/voice/useVoiceSpeak.ts (1)
50-60:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent out-of-order TTS playback when multiple
voice:speakevents overlap.Older synth requests can resolve after newer ones and then stop newer playback, causing stale prompts to be spoken.
💡 Suggested fix
export function useVoiceSpeak(): void { const handleRef = useRef<PlaybackHandle | null>(null); + const requestSeqRef = useRef(0); useEffect(() => { const onSpeak = (...args: unknown[]): void => { const payload = args[0]; if (!isSpeakPayload(payload)) return; const text = payload.text.trim(); if (!text) return; + const seq = ++requestSeqRef.current; void (async () => { try { const { audio_base64: audioBase64, audio_mime: audioMime } = await synthesizeSpeech(text); - if (!audioBase64) return; + if (!audioBase64 || seq !== requestSeqRef.current) return; handleRef.current?.stop(); const handle = await playBase64Audio(audioBase64, audioMime || 'audio/mpeg', { maxDurationMs: MAX_SPEAK_MS, }); + if (seq !== requestSeqRef.current) { + handle.stop(); + return; + } handleRef.current = handle; handle.ended.catch(swallowAudioStop); } catch (err) { log('voice:speak playback failed: %o', err); } })(); }; socketService.on('voice:speak', onSpeak); return () => { + requestSeqRef.current += 1; socketService.off('voice:speak', onSpeak); handleRef.current?.stop(); handleRef.current = null; }; }, []); }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.export function useVoiceSpeak(): void { const handleRef = useRef<PlaybackHandle | null>(null); const requestSeqRef = useRef(0); useEffect(() => { const onSpeak = (...args: unknown[]): void => { const payload = args[0]; if (!isSpeakPayload(payload)) return; const text = payload.text.trim(); if (!text) return; const seq = ++requestSeqRef.current; void (async () => { try { const { audio_base64: audioBase64, audio_mime: audioMime } = await synthesizeSpeech(text); if (!audioBase64 || seq !== requestSeqRef.current) return; // Stop any in-flight prompt before starting the next one. handleRef.current?.stop(); const handle = await playBase64Audio(audioBase64, audioMime || 'audio/mpeg', { maxDurationMs: MAX_SPEAK_MS, }); if (seq !== requestSeqRef.current) { handle.stop(); return; } handleRef.current = handle; handle.ended.catch(swallowAudioStop); } catch (err) { log('voice:speak playback failed: %o', err); } })(); }; socketService.on('voice:speak', onSpeak); return () => { requestSeqRef.current += 1; socketService.off('voice:speak', onSpeak); handleRef.current?.stop(); handleRef.current = null; }; }, []); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/voice/useVoiceSpeak.ts` around lines 50 - 60, The synth/ playback race happens because older synthesizeSpeech calls can resolve later and call handleRef.current?.stop(), killing newer audio; fix by introducing a monotonic request id (e.g., requestIdRef) and capture const myRequestId = ++requestIdRef.current at the start of the async block, then after synthesizeSpeech resolves verify myRequestId === requestIdRef.current before stopping the current handle or assigning handleRef.current = handle (and similarly before calling handle.ended.catch). This ensures only the latest voice:speak invocation can stop or replace the shared handle; reference synthesizeSpeech, handleRef, playBase64Audio, and handle.ended in your change.app/src/notch/NotchApp.tsx (1)
164-261:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent post-unmount socket initialization and listener leaks.
connectSocketuses a localdisposedflag and returns a disposer, but that disposer is never called by the effect. If unmount happens whileconnectCoreSocket()is in-flight, the socket can still be assigned/connected after teardown.Suggested fix
export default function NotchApp() { const { t } = useT(); const [state, setState] = useState<NotchState>({ mode: 'ready', text: '' }); const dismissRef = useRef<number | null>(null); const socketRef = useRef<Socket | null>(null); + const disposedRef = useRef(false); const connectSocket = useCallback( (baseUrl: string) => { + if (disposedRef.current) return; if (socketRef.current?.connected) return; if (socketRef.current) { socketRef.current.disconnect(); } - let disposed = false; void (async () => { try { const socket = await connectCoreSocket({ getBaseUrl: async () => baseUrl, - isDisposed: () => disposed, + isDisposed: () => disposedRef.current, }); - if (!socket || disposed) return; + if (!socket || disposedRef.current) { + socket?.disconnect(); + return; + } socketRef.current = socket; // ... } catch (err) { // ... } })(); - return () => { - disposed = true; - }; }, [t, clearDismiss, scheduleDismiss] ); useEffect(() => { + disposedRef.current = false; // ... return () => { + disposedRef.current = true; window.removeEventListener('notch:core-url', handler as EventListener); socketRef.current?.disconnect(); socketRef.current = null; clearDismiss(); }; }, [connectSocket, clearDismiss]);Also applies to: 267-289
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/notch/NotchApp.tsx` around lines 164 - 261, connectSocket can initialize a socket after the component unmounts because the returned disposable isn't invoked; ensure teardown is actually used and prevent post-unmount assignment/leaks by (1) keeping the disposed flag check after await connectCoreSocket(...) and if disposed is true then immediately disconnect the newly created socket and return, (2) register socket listeners on the socket instance only when not disposed, and (3) ensure the caller effect that invokes connectSocket actually captures and calls its returned cleanup function (i.e., call the function returned by connectSocket inside the effect's cleanup). Refer to connectSocket, connectCoreSocket, socketRef, and the local disposed flag when making these changes.app/src/utils/tauriCommands/window.ts (1)
122-123:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace raw
console.warnerror logging with redacted namespaced debug logs.Line 122 and Line 136 currently log raw
errobjects inapp/src/**. That can leak sensitive runtime details and bypasses the repo’s logging convention. Use a namespaceddebuglogger and only log a sanitized message.Proposed fix
import { invoke } from '`@tauri-apps/api/core`'; import { getCurrentWindow } from '`@tauri-apps/api/window`'; +import createDebug from 'debug'; import { isTauri } from './common'; +const log = createDebug('openhuman:tauri:window'); @@ try { await invoke('notch_window_show'); } catch (err) { - console.warn('[notch] show failed', err); + const message = err instanceof Error ? err.message : String(err); + log('notch_window_show failed: %s', message); } } @@ try { await invoke('notch_window_hide'); } catch (err) { - console.warn('[notch] hide failed', err); + const message = err instanceof Error ? err.message : String(err); + log('notch_window_hide failed: %s', message); } }As per coding guidelines: "Never log secrets or full PII—redact sensitive information in all logs" and "In
app/src, use namespaceddebuglogs (e.g. from thedebugnpm package) with dev-only detail for development diagnostics".Also applies to: 136-137
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/utils/tauriCommands/window.ts` around lines 122 - 123, Replace the raw console.warn(err) calls in app/src/utils/tauriCommands/window.ts with a namespaced debug logger and redacted error text: import and instantiate a debug logger (e.g. const debug = createDebug('app:tauri:window') or using the repo's existing debug factory), then change the console.warn in the function(s) that show the window (and the other occurrences around lines 136-137) to debug(...) while only logging a sanitized message and non-sensitive fields (e.g. debug('show failed:', safeErrorString(err)) where safeErrorString returns err.message or a redacted summary, not the full err object); ensure you add the debug import/initialization and replace all similar console.warn/raw err logs in this module to follow the same pattern.src/openhuman/accessibility/app_fastpaths/music.rs (2)
150-166:
⚠️ Potential issue | 🟠 Major | ⚡ Quick win🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail FILE="src/openhuman/accessibility/app_fastpaths/music.rs" # Basic sanity: file exists test -f "$FILE" && echo "Found $FILE" || { echo "Missing $FILE"; exit 1; } # Show the relevant sections around the reported line ranges/snippet echo "---- replace_ci context (lines 120-220) ----" sed -n '120,220p' "$FILE" | cat -n echo "---- logging context (lines 220-320) ----" sed -n '220,320p' "$FILE" | cat -n # Locate replace_ci definition and all call sites echo "---- replace_ci definitions ----" rg -n "fn\s+replace_ci\s*\(" "$FILE" echo "---- replace_ci call sites ----" rg -n "replace_ci\s*\(" "$FILE" # Find likely sensitive logging of query/user input echo "---- likely query/user-input logging ----" rg -n "log::|tracing::|info!|debug!|trace!|warn!|error!" "$FILE" # Specifically search for "query" in logs echo "---- occurrences of parsed/query in file ----" rg -n "(query|parsed|url|request|raw)" "$FILE"Repository: tinyhumansai/openhuman
Length of output: 11772
🏁 Script executed:
#!/bin/bash set -euo pipefail rg -n "replace_ci\s*\(" -S .Repository: tinyhumansai/openhuman
Length of output: 298
🏁 Script executed:
#!/bin/bash set -euo pipefail FILE="src/openhuman/accessibility/app_fastpaths/music.rs" sed -n '1,130p' "$FILE" | cat -n sed -n '130,210p' "$FILE" | cat -nRepository: tinyhumansai/openhuman
Length of output: 9202
Fix Unicode-unsafe indexing in
replace_ciand redact logged play query.
replace_cilowercaseshaystackintohlbut sliceshlusing byte offsets (i) from the originalhaystack; Unicode case mapping can change byte lengths/boundaries, sohl[i..]can panic on valid UTF-8 input.runlogs the full user-derivedqueryat info level (log::info!("[automate::music] ▶ play query={query:?}")), violating the “never log secrets/full PII” redaction guidance.🛠️ Proposed fix
fn replace_ci(haystack: &str, needle: &str, repl: &str) -> String { - let hl = haystack.to_lowercase(); - let nl = needle.to_lowercase(); let mut out = String::with_capacity(haystack.len()); let mut i = 0; while i < haystack.len() { - if hl[i..].starts_with(&nl) { + let is_match = haystack[i..] + .get(..needle.len()) + .map(|s| s.eq_ignore_ascii_case(needle)) + .unwrap_or(false); + if is_match { out.push_str(repl); i += needle.len(); } else { - let ch = haystack[i..].chars().next().unwrap(); + let ch = haystack[i..].chars().next().expect("char boundary"); out.push(ch); i += ch.len_utf8(); } } out }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/app_fastpaths/music.rs` around lines 150 - 166, The replace_ci function is using byte-indexing into the lowercased string (hl[i..]) while i is tracked over the original haystack's bytes, which is Unicode-unsafe; change replace_ci to iterate using char_indices (or parallel char_indices over haystack and hl) and track byte offsets for both original and lowercased forms so you only slice at valid UTF-8 boundaries (e.g., use for (orig_byte_idx, ch) in haystack.char_indices() and maintain a corresponding hl_byte_idx into hl to call hl[hl_byte_idx..].starts_with(&nl) and advance both indices by the matched needle.chars().map(|c|c.len_utf8()).sum::<usize>() or by the matched byte lengths); and in run, stop logging the full user-derived query via log::info!("[automate::music] ▶ play query={query:?}"), instead log a redacted or masked version (e.g., omit the query, log only length or a fixed placeholder) to avoid exposing PII/secrets.
237-237:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact query content from logs.
This logs raw user-derived query text. Please log metadata (length/hash) instead of full content.
🛡️ Proposed fix
- log::info!("[automate::music] ▶ play query={query:?}"); + log::debug!( + "[automate::music] ▶ play query_chars={}", + query.chars().count() + );As per coding guidelines:
Never log secrets or full PII—redact sensitive information in all logs.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.log::debug!( "[automate::music] ▶ play query_chars={}", query.chars().count() );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/app_fastpaths/music.rs` at line 237, The current log call logs raw user query via log::info!("[automate::music] ▶ play query={query:?}"); — replace that to avoid PII by computing and logging only non-sensitive metadata: compute query_len = query.len() and query_hash = sha256_hex(query) (or another stable hash) and update the log to something like log::info!("[automate::music] ▶ play query_len={} query_hash={}", query_len, query_hash); ensure the original `query` string is not included anywhere in logs and reference the same variable name `query` and the logging site (log::info! call) when making the change.src/openhuman/accessibility/automate.rs (1)
210-212:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact user/model content before logging automation state.
Line 224 logs the full
goal, Line 314 logslabel/filter, and Line 300 logs parse errors that include raw model output from Line 211. Those fields can contain sensitive user content and should be redacted/omitted in logs.As per coding guidelines `**/*.{ts,tsx,js,jsx,rs}`: Never log secrets or full PII—redact sensitive information in all logs.🔧 Suggested patch
+fn redact_log_value(input: &str) -> String { + let len = input.chars().count(); + if len == 0 { + "<empty>".to_string() + } else { + format!("<redacted:{len} chars>") + } +} + fn parse_action(raw: &str) -> Result<Action, String> { @@ - Err(format!( - "could not parse an action from model output: {trimmed:?}" - )) + Err("could not parse an action from model output".to_string()) } @@ log::info!( - "{LOG_PREFIX} ▶ run app={app:?} goal={goal:?} budget={}", - opts.step_budget + "{LOG_PREFIX} ▶ run app={app:?} goal={} budget={}", + redact_log_value(goal), + opts.step_budget ); @@ - log::warn!("{LOG_PREFIX} step={step} unparseable action, retrying: {e}"); + log::warn!("{LOG_PREFIX} step={step} unparseable action, retrying"); @@ log::info!( - "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}", + "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={} filter={}", action.action, - action.label, - action.filter + redact_log_value(&action.label), + redact_log_value(&action.filter) );Also applies to: 223-226, 300-301, 313-318
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/automate.rs` around lines 210 - 212, The logs currently emit raw user/model content (e.g., variables goal, label, filter and trimmed/model output) which may contain PII; update the logging and error creation to redact or omit sensitive content instead of printing the full values — replace direct interpolations like trimmed, goal, label, filter with a sanitized/redacted representation (e.g., call a helper redact_sensitive(s: &str) or replace with "[REDACTED]") in the Err(format!(...)) and all log calls that reference those symbols so logs never contain full user/model text.src/openhuman/credentials/ops.rs (1)
44-47:
⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAdd symmetric teardown for always-on service.
This starts a new login-gated service, but
stop_login_gated_servicesdoes not stop/disable it. That leaves always-on capture active after logout/user switch.Suggested direction
pub async fn stop_login_gated_services(config: &Config) { + // Stop/disable always-on listening before tearing down auth-scoped services. + crate::openhuman::voice::always_on::stop(); + // 1. Autocomplete — stop engine + Swift overlay helper. {// src/openhuman/voice/always_on.rs pub fn stop() { ENABLED.store(false, Ordering::SeqCst); // Prefer full stream/thread teardown if possible, not just logical disable. }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/credentials/ops.rs` around lines 44 - 47, The always-on voice service started by crate::openhuman::voice::always_on::start_if_enabled(config) isn't torn down on logout; add a symmetric shutdown path by implementing a public stop() in src/openhuman/voice/always_on.rs that clears ENABLED (Ordering::SeqCst) and performs full stream/thread teardown, then call crate::openhuman::voice::always_on::stop() from stop_login_gated_services so the always-on capture is fully disabled on logout/user switch.src/openhuman/tools/impl/browser/screenshot.rs (1)
209-225:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't return an oversized JPEG from the "fit within budget" helper.
If none of the generated thumbnails fit
max_bytes,laststill holds the final oversized candidate, and the current tail returns it asOk(...). That violates this helper's contract and can inline a payload larger than the budget on the failure path.🔧 Suggested fix
fn downscale_to_jpeg(bytes: &[u8], max_bytes: usize) -> Result<(Vec<u8>, u32, u32), String> { let img = image::load_from_memory(bytes).map_err(|e| format!("decode: {e}"))?; let mut last: Option<(Vec<u8>, u32, u32)> = None; for max_dim in [1568u32, 1280, 1024, 768, 600] { let thumb = img.thumbnail(max_dim, max_dim); // fits within max_dim², keeps aspect let mut buf = std::io::Cursor::new(Vec::new()); image::codecs::jpeg::JpegEncoder::new_with_quality(&mut buf, 72) .encode_image(&thumb) .map_err(|e| format!("jpeg encode: {e}"))?; let out = buf.into_inner(); let (w, h) = (thumb.width(), thumb.height()); if out.len() <= max_bytes { return Ok((out, w, h)); } last = Some((out, w, h)); } - last.ok_or_else(|| "could not produce a fitting JPEG".to_string()) + Err(last + .map(|(out, w, h)| { + format!("smallest JPEG was still too large ({} bytes at {}x{})", out.len(), w, h) + }) + .unwrap_or_else(|| "could not produce a fitting JPEG".to_string())) }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.fn downscale_to_jpeg(bytes: &[u8], max_bytes: usize) -> Result<(Vec<u8>, u32, u32), String> { let img = image::load_from_memory(bytes).map_err(|e| format!("decode: {e}"))?; let mut last: Option<(Vec<u8>, u32, u32)> = None; for max_dim in [1568u32, 1280, 1024, 768, 600] { let thumb = img.thumbnail(max_dim, max_dim); // fits within max_dim², keeps aspect let mut buf = std::io::Cursor::new(Vec::new()); image::codecs::jpeg::JpegEncoder::new_with_quality(&mut buf, 72) .encode_image(&thumb) .map_err(|e| format!("jpeg encode: {e}"))?; let out = buf.into_inner(); let (w, h) = (thumb.width(), thumb.height()); if out.len() <= max_bytes { return Ok((out, w, h)); } last = Some((out, w, h)); } Err(last .map(|(out, w, h)| { format!("smallest JPEG was still too large ({} bytes at {}x{})", out.len(), w, h) }) .unwrap_or_else(|| "could not produce a fitting JPEG".to_string())) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/browser/screenshot.rs` around lines 209 - 225, The helper downscale_to_jpeg currently returns the last oversized candidate stored in last when no thumbnail fits max_bytes; change this so that if the loop finishes without finding out.len() <= max_bytes it returns an Err (e.g., via last.ok_or_else -> Err) instead of returning Ok(last), preserving the function's contract; update the tail of downscale_to_jpeg (referencing variables last and max_bytes and the function name) to produce an error when no size fits rather than returning the oversized JPEG.src/openhuman/tools/impl/computer/automate.rs (2)
1-16: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Put
AutomateToolin the owning domain module, notsrc/openhuman/tools/impl/.This adds a new domain-owned tool to the legacy impl tree instead of the repo’s domain-owned
tools.rs/tools/structure. Please move it into the owning module and re-export it from the normal tool surface there.Based on learnings, "Tool implementations belonging to a domain must live in that owning module's
tools.rs(and optionaltools/submodules), not insrc/openhuman/tools/impl/; re-export those tool types fromsrc/openhuman/tools/mod.rs."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/computer/automate.rs` around lines 1 - 16, The AutomateTool implementation belongs in the domain's owning module not in the impl tree: move the AutomateTool type, its Tool impl, and any helper functions/uses (e.g., AutomateOptions, RealBackend, is_sensitive_app) out of the legacy impl file and into the owning domain's tools.rs (or a tools/ submodule) for the computer/openhuman domain, update the module imports there, and then re-export AutomateTool from the central tools mod (src/openhuman/tools/mod.rs) so external callers use the normal tool surface; also remove the old file or its module entry from the impl/ folder and fix any use sites to import the re-exported AutomateTool.
105-105:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't log the raw
goal.
goalis free-form user input, so this can capture message bodies, names, or other sensitive content in persistent logs. Log only redacted/derived metadata here (for exampleapp,goal_len, or a hash).Proposed fix
- log::info!("[automate] ▶ tool execute app={app:?} goal={goal:?}"); + log::info!( + "[automate] ▶ tool execute app={app:?} goal_len={}", + goal.chars().count() + );As per coding guidelines,
**/*.{ts,tsx,js,jsx,rs}: Never log secrets or full PII—redact sensitive information in all logs.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.log::info!( "[automate] ▶ tool execute app={app:?} goal_len={}", goal.chars().count() );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/computer/automate.rs` at line 105, The current log statement uses log::info! and includes the raw free-form variable goal which may contain sensitive user data; change the log to avoid printing goal directly and instead record only derived/redacted metadata (for example keep app as-is and replace goal with goal_len and/or a short hash), e.g. compute a length (goal.len()) and a stable hash of goal (SHA-256 or similar) and log those fields instead of the full goal string so log::info!("[automate] ▶ tool execute app={app:?} goal_len={goal_len} goal_hash={goal_hash:?}");src/openhuman/voice/always_on.rs (2)
229-230:
⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid stale config snapshot in long-lived always-on processor.
The processor captures
Configonce at startup and never refreshes it; later config updates (wake word, STT provider, VAD tuning) won’t take effect whileRUNNINGis true.Suggested direction
- let config = app_config.clone(); + // Keep live runtime settings in shared state, updated on every start_if_enabled call. + update_runtime_voice_settings(&app_config.voice_server);// On each utterance / decision point: let settings = get_runtime_voice_settings();Use a shared
RwLock/atomic settings struct so reconfig applies without restarting capture.Also applies to: 252-253, 286-289, 331-334, 363-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/always_on.rs` around lines 229 - 230, The always-on processor currently captures a static AppConfig via app_config.clone and builds VadConfig with VadConfig::from_server_config at startup, causing updates (wake word, STT provider, VAD tuning) to be ignored while RUNNING is true; change this to read a shared, mutable runtime settings object instead of cloning at startup: introduce or use an existing shared RwLock/atomic voice settings struct (e.g., get_runtime_voice_settings()) and replace direct uses of app_config / VadConfig::from_server_config in the long‑lived loop and at each decision point (locations around where app_config.clone is used and the VadConfig creation) so you fetch updated settings per utterance/decision and rebuild or reference the latest VadConfig/STT/wake-word provider on demand.
365-375:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact transcript/command content from logs.
This logs full spoken content (
command={cmd:?}/transcript={text:?}), which can include PII/secrets from ambient speech.As per coding guidelines: `Never log secrets or full PII—redact sensitive information in all logs`.Proposed redaction change
- log::info!("{LOG_PREFIX} wake word matched → command={cmd:?}"); + log::info!("{LOG_PREFIX} wake word matched → command_len={}", cmd.len()); - log::info!( - "{LOG_PREFIX} no wake word ({:?}) in transcript={text:?}; ignored", - config.voice_server.wake_word - ); + log::info!( + "{LOG_PREFIX} no wake word match (wake_word_configured={}, transcript_len={}); ignored", + !config.voice_server.wake_word.trim().is_empty(), + text.len() + );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.log::info!("{LOG_PREFIX} wake word matched → command_len={}", cmd.len()); notch_status("Processing", 12000); // pill: running the command deliver_command(config, cmd).await; } None => { // Visible at info so the user can see WHAT was heard when the // wake word didn't match (diagnoses "Hey Tiny not responding"). log::info!( "{LOG_PREFIX} no wake word match (wake_word_configured={}, transcript_len={}); ignored", !config.voice_server.wake_word.trim().is_empty(), text.len() );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/always_on.rs` around lines 365 - 375, The logs currently print full spoken content via log::info! including cmd and text (see the log lines around LOG_PREFIX and the deliver_command call), which may expose PII/secrets; update those log statements to redact or mask sensitive content instead of printing cmd or text (e.g., replace with a redacted placeholder, truncated summary, or a hashed token) and only log safe metadata (like whether a wake word matched, confidence flags, or a masked length). Apply the change to both the wake-word-matched branch (before calling deliver_command) and the None branch that logs transcript, or factor out a helper (e.g., redact_transcript or mask_sensitive) to centralize redaction so future logs use the safe representation. Ensure deliver_command invocation remains unchanged except for not logging cmd contents directly in the surrounding info messages.
Union the voice mod decls (always_on + approval_surface); take main's fully-updated tracker table and mark the Phase 4 voice-confirmation row Done.
…tion) The speak bus is a process-global broadcast and all approval prompts share VOICE_APPROVAL_SOURCE, so the happy-path test's published prompt leaked into the parallel 'should stay silent' receivers. Guard the bus-touching tests with a shared lock (acquire → subscribe) so each runs isolated.
Independent review (beyond the CodeRabbit pass)Reviewed Phase 4 — the voice-native approval gate: the Reviewed clean
One fix applied during babysit: the The cross-process glue (web.rs param, socket bridge, startup registration) is integration-shaped (exercised via the approval E2E); the pure formatter/gate/bus logic is fully unit-tested. No correctness issues. LGTM once CI is green. |
Summary
AgentTurnOriginlabel, the spoken-yes/no ingress path, and theoverlay:attentionsocket-bridge pattern. Typed-chat approvals stay visual-only (the in-app card).voice:speakcore→UI primitive (speak bus + socket bridge) and an app-wide frontend player.This is Phase 4 of the voice→system-action feature; see
docs/voice-system-actions.mdChange 4.1.Problem
The
ApprovalGatealready classifies sensitive tool calls and parks them for a yes/no decision — but the prompt is visual-only (the approval card) and answered by click or typedyes/no. A hands-free / always-on voice user looking away from the screen never hears the prompt and can't approve by voice.Solution
voice: boolonApprovalChatContext(set from a new optionalvoiceparam on thechannel_web_chatRPC +chat:startsocket event), stamped ontoDomainEvent::ApprovalRequested { is_voice }inapproval/gate.rs. The frontend tags dictation / always-on auto-sends (Conversations.tsx→chatSend({ voice: true })); typed turns omit it.src/openhuman/voice/approval_surface.rs, new): anEventHandlermirroringtelegram/approval_surface.rs— onApprovalRequested { is_voice: true }it buildsspoken_prompt("<summary>. Say yes to confirm, or no to cancel.") and publishes it. Registered at startup beside the web/telegram surfaces.src/openhuman/voice/speak_bus.rs, new):publish_speak/subscribe_speak_eventsbroadcast mirroringoverlay/bus.rs;core/socketio.rsbridges it to avoice:speakSocket.IO event.useVoiceSpeak(new, mounted app-wide inAppShellDesktop) playsvoice:speakthrough the existing TTS pipeline (synthesizeSpeech→playBase64Audio), so the prompt is heard even when the mascot view isn't open.web.rsingress yes/no router →approval_decide.Decisions agreed up front: make the existing gate voice-native (not a new voice-fast-path gate); speak only for voice-initiated turns.
Submission Checklist
spoken_promptformatter, theis_voicegate (speaks ontrue, silent onfalse/ non-approval / empty-summary),speak_busround-trip. Frontend (Vitest):useVoiceSpeaksynthesizes + plays onvoice:speak, ignores empty/malformed payloads, unsubscribes on unmount.RealBackend-style audio playback is mic/TTS-dependent (manual).N/A: extends the existing approval flow, no new feature row.N/A: mic + cloud-TTS dependent; not a release-cut surface.Impact
voiceget the prior visual-only behaviour.Related
docs/voice-system-actions.mdChange 4.1.Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests