Skip to content

feat(voice): voice-native approval — spoken TTS confirmation for sensitive actions (Phase 4 of #3148)#3368

Open
M3gA-Mind wants to merge 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-phase4-tts-confirmation
Open

feat(voice): voice-native approval — spoken TTS confirmation for sensitive actions (Phase 4 of #3148)#3368
M3gA-Mind wants to merge 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-phase4-tts-confirmation

Conversation

@M3gA-Mind
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind commented Jun 4, 2026

Summary

  • Makes the existing ApprovalGate voice-native: when a sensitive agent action is parked for approval during a voice-initiated turn, the assistant speaks the confirmation prompt aloud so a hands-free / always-on user can answer "yes"/"no" by voice.
  • No parallel system — reuses the gate's sensitivity classification, the AgentTurnOrigin label, the spoken-yes/no ingress path, and the overlay:attention socket-bridge pattern. Typed-chat approvals stay visual-only (the in-app card).
  • Adds a reusable voice:speak core→UI primitive (speak bus + socket bridge) and an app-wide frontend player.

This is Phase 4 of the voice→system-action feature; see docs/voice-system-actions.md Change 4.1.

Problem

The ApprovalGate already classifies sensitive tool calls and parks them for a yes/no decision — but the prompt is visual-only (the approval card) and answered by click or typed yes/no. A hands-free / always-on voice user looking away from the screen never hears the prompt and can't approve by voice.

Solution

  • Origin flag: voice: bool on ApprovalChatContext (set from a new optional voice param on the channel_web_chat RPC + chat:start socket event), stamped onto DomainEvent::ApprovalRequested { is_voice } in approval/gate.rs. The frontend tags dictation / always-on auto-sends (Conversations.tsxchatSend({ voice: true })); typed turns omit it.
  • Voice approval surface (src/openhuman/voice/approval_surface.rs, new): an EventHandler mirroring telegram/approval_surface.rs — on ApprovalRequested { is_voice: true } it builds spoken_prompt ("<summary>. Say yes to confirm, or no to cancel.") and publishes it. Registered at startup beside the web/telegram surfaces.
  • Speak primitive (src/openhuman/voice/speak_bus.rs, new): publish_speak/subscribe_speak_events broadcast mirroring overlay/bus.rs; core/socketio.rs bridges it to a voice:speak Socket.IO event.
  • Frontend playback: useVoiceSpeak (new, mounted app-wide in AppShellDesktop) plays voice:speak through the existing TTS pipeline (synthesizeSpeechplayBase64Audio), so the prompt is heard even when the mascot view isn't open.
  • Spoken answer — no new code: a voice "yes"/"no" rides the existing transcription → auto-send → web.rs ingress yes/no router → approval_decide.

Decisions agreed up front: make the existing gate voice-native (not a new voice-fast-path gate); speak only for voice-initiated turns.

Submission Checklist

  • Tests added/updated (happy + failure/edge) — Rust: spoken_prompt formatter, the is_voice gate (speaks on true, silent on false / non-approval / empty-summary), speak_bus round-trip. Frontend (Vitest): useVoiceSpeak synthesizes + plays on voice:speak, ignores empty/malformed payloads, unsubscribes on unmount.
  • Diff coverage ≥ 80% — new pure logic + surfaces are unit-tested; the cross-process glue (web.rs param plumbing, socket bridge, startup registration) is integration-shaped and exercised via the existing approval E2E. The RealBackend-style audio playback is mic/TTS-dependent (manual).
  • Coverage matrix — N/A: extends the existing approval flow, no new feature row.
  • No new external network dependencies — vision/TTS rides the already-configured backend TTS proxy via the existing mock-backed RPC.
  • Manual smoke — N/A: mic + cloud-TTS dependent; not a release-cut surface.
  • Linked issue — Phase 4 of feat: always-on voice command → system action (listen, understand, execute) #3148 (other phases remain, so not a closing keyword).

Impact

  • Cross-platform-safe: the speak bus + socket bridge + surface are platform-neutral; playback is desktop UI. Additive event field + optional RPC param — older clients that don't send voice get the prior visual-only behaviour.
  • No new tool, no new approval surface semantics, no new injected JS.

Related

Summary by CodeRabbit

Release Notes

  • New Features

    • Voice-initiated approval confirmations now deliver spoken prompts for hands-free yes/no responses
    • App-wide proactive assistant speech capability integrated
    • Voice/typed turn distinction optimizes approval experience based on interaction mode
  • Documentation

    • Updated voice system specification marking voice-native approval phase as complete
  • Tests

    • Comprehensive test coverage added for voice approval, speech routing, and channel integration

@M3gA-Mind M3gA-Mind requested a review from a team June 4, 2026 12:57
@M3gA-Mind M3gA-Mind marked this pull request as draft June 4, 2026 12:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38ed957d-5715-460f-af9f-351d2e720aaa

📥 Commits

Reviewing files that changed from the base of the PR and between c00ceec and 6ad1e38.

📒 Files selected for processing (24)
  • app/src/App.tsx
  • app/src/features/human/voice/useVoiceSpeak.test.ts
  • app/src/features/human/voice/useVoiceSpeak.ts
  • app/src/pages/Conversations.tsx
  • app/src/services/chatService.ts
  • docs/voice-system-actions.md
  • src/core/event_bus/events.rs
  • src/core/event_bus/events_tests.rs
  • src/core/jsonrpc.rs
  • src/core/socketio.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs
  • src/openhuman/approval/gate.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/providers/telegram/approval_surface_tests.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/tools/impl/system/install_tool.rs
  • src/openhuman/voice/approval_surface.rs
  • src/openhuman/voice/approval_surface_tests.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/speak_bus.rs
  • src/openhuman/wallet/execution_tests.rs
💤 Files with no reviewable changes (6)
  • src/openhuman/tools/impl/system/install_tool.rs
  • src/openhuman/wallet/execution_tests.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/speak_bus.rs
  • src/openhuman/voice/approval_surface_tests.rs
  • src/openhuman/voice/approval_surface.rs
✅ Files skipped from review due to trivial changes (1)
  • docs/voice-system-actions.md
🚧 Files skipped from review as they are similar to previous changes (14)
  • src/openhuman/agent/harness/tool_loop_tests.rs
  • src/core/event_bus/events.rs
  • src/openhuman/channels/providers/telegram/approval_surface_tests.rs
  • app/src/pages/Conversations.tsx
  • src/openhuman/approval/gate.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • app/src/features/human/voice/useVoiceSpeak.test.ts
  • src/openhuman/channels/runtime/startup.rs
  • src/core/event_bus/events_tests.rs
  • app/src/features/human/voice/useVoiceSpeak.ts
  • src/openhuman/channels/bus.rs
  • src/core/socketio.rs
  • src/openhuman/channels/providers/web.rs
  • app/src/services/chatService.ts

📝 Walkthrough

Walkthrough

This PR implements voice-initiated approval flows: when a user sends a voice-dictated message, the system threads a voice: true flag through chat creation, triggers voice-specific approval behavior, and broadcasts a synthesized spoken approval prompt back to the frontend for playback.

Changes

Voice-initiated approval and speech synthesis

Layer / File(s) Summary
Speak bus infrastructure
src/openhuman/voice/speak_bus.rs
SpeakRequest struct and broadcast channel route proactive speech from core to UI. publish_speak() validates and broadcasts, subscribe_speak_events() provides receiver. Unit tests verify delivery and whitespace filtering.
Voice approval surface and event handling
src/openhuman/voice/approval_surface.rs, src/openhuman/voice/approval_surface_tests.rs, src/openhuman/voice/mod.rs
VoiceApprovalSurfaceSubscriber listens for is_voice: true approvals, formats spoken prompts, and publishes to speak bus. Ignores non-voice approvals. Tests verify prompt formatting and voice/typed gating.
Voice flag in event bus and approval context
src/core/event_bus/events.rs, src/openhuman/approval/gate.rs
DomainEvent::ApprovalRequested adds is_voice: bool field. ApprovalChatContext stores voice flag per turn. ApprovalGate::intercept_audited reads voice context and includes in event payload.
Web chat API voice parameter threading
src/openhuman/channels/providers/web.rs
start_chat() and channel_web_chat() accept voice: bool parameter. WebChatParams schema includes optional voice field. Voice context stored in per-turn ApprovalChatContext; follow-ups default to non-voice.
Socket.IO integration for speech broadcasting
src/core/socketio.rs, src/core/jsonrpc.rs, src/openhuman/channels/runtime/startup.rs
ChatStartPayload includes optional voice field. New background task in socket bridge subscribes to speak bus and broadcasts voice:speak events to all clients. Voice approval surface registered at channel startup.
Frontend voice speak hook and integration
app/src/features/human/voice/useVoiceSpeak.ts, app/src/features/human/voice/useVoiceSpeak.test.ts, app/src/App.tsx
useVoiceSpeak() hook subscribes to voice:speak socket events, synthesizes TTS, stops prior playback, enforces max duration, and cleans up on unmount. Wired into desktop shell. Tests verify synthesis, whitespace handling, and unsubscription.
Chat service and Conversations voice flag threading
app/src/pages/Conversations.tsx, app/src/services/chatService.ts
ChatSendParams and chatSend() add optional voice field. Dictation handler sends with voice: true when autoSend enabled. Voice flag flows from frontend through message sending to backend.
Test infrastructure updates
tests/*, src/**/*_tests.rs
Event construction, approval context fixtures, and start_chat invocations updated across 31 test files to include voice: false field, ensuring tests compile and exercise the full signature.
Documentation
docs/voice-system-actions.md
Phase 4 Change 4.1 (voice-native approval) marked done. Describes voice-only approval behavior, speak-event surface, frontend playback wiring, and test coverage.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related Issues

Possibly Related PRs

  • tinyhumansai/openhuman#2706: Both PRs modify the approval flow; this PR adds voice-initiation signaling (is_voice payload) and voice-specific approval prompting, while the retrieved PR consolidates approval gate logic with persistent allowlists.

Poem

🎙️ A whispered prompt flows far and wide,
Through speak-bus channels, side by side,
The voice gate listens, speaks aloud,
Approval answered, clear and proud,
No typing needed, just "yes" or "no!" 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: voice-native approval with spoken TTS confirmation for sensitive actions, which is the core focus of the entire changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team. labels Jun 4, 2026
…ions aloud (Phase 4)

Makes the existing ApprovalGate voice-native: when a sensitive agent action is
parked for approval during a voice-initiated turn, the assistant speaks the
confirmation prompt aloud so a hands-free / always-on user can answer "yes"/"no"
by voice. Typed-chat approvals stay visual-only (the in-app approval card).

- Origin flag: `voice: bool` on `ApprovalChatContext` (set from a new optional
  `voice` param on the `channel_web_chat` RPC + `chat:start` socket event),
  stamped onto `DomainEvent::ApprovalRequested { is_voice }` at publish time.
  The frontend tags dictation / always-on auto-sends (`chatSend({ voice: true })`).
- New `voice/approval_surface.rs`: an EventHandler (mirrors the Telegram approval
  surface) that, on `ApprovalRequested { is_voice: true }`, builds a short spoken
  line and publishes it. Registered at startup beside the web/telegram surfaces.
- New `voice/speak_bus.rs`: `publish_speak`/`subscribe_speak_events` broadcast
  (mirrors `overlay/bus.rs`); `core/socketio.rs` bridges it to a `voice:speak`
  Socket.IO event.
- Frontend `useVoiceSpeak` (mounted app-wide) plays `voice:speak` via the
  existing TTS pipeline, so the prompt is heard even without the mascot view.
- Spoken yes/no answer needs no new code — it rides the existing transcription →
  auto-send → web.rs ingress yes/no → approval_decide path.

Tests: Rust — spoken_prompt formatter, the is_voice gate (speaks on true, silent
on false / non-approval / empty-summary), speak_bus round-trip. Frontend (Vitest)
— useVoiceSpeak synthesizes + plays on voice:speak, ignores empty/malformed
payloads, unsubscribes on unmount.

Phase 4 of tinyhumansai#3148. Follow-up: localize the server-built spoken suffix.
@M3gA-Mind M3gA-Mind force-pushed the feat/voice-phase4-tts-confirmation branch from c00ceec to 349c44a Compare June 4, 2026 13:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/config/ops_tests.rs (1)

1031-1061: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the new voice fields in this test.

Line 1046 and Line 1047 set always_on_enabled/wake_word, but the test never verifies they persisted. This leaves the new path effectively untested.

Proposed test assertions
     let outcome = load_and_apply_voice_server_settings(patch)
         .await
         .expect("ok");
+    assert_eq!(
+        outcome.value["config"]["voice_server"]["always_on_enabled"],
+        serde_json::json!(true)
+    );
+    assert_eq!(
+        outcome.value["config"]["voice_server"]["wake_word"],
+        serde_json::json!("Hey Tiny")
+    );
     assert!(
         outcome.value["config"]["voice_server"]["min_duration_secs"]
             .as_f64()
             .unwrap_or(-1.0)
             >= 0.0
     );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/ops_tests.rs` around lines 1031 - 1061, The test
load_and_apply_voice_server_settings_accepts_valid_modes_and_clamps sets
always_on_enabled and wake_word but never asserts they persisted; update the
assertions after calling load_and_apply_voice_server_settings (using the outcome
variable) to check outcome.value["config"]["voice_server"]["always_on_enabled"]
is true (use as_bool()/unwrap_or(false)) and
outcome.value["config"]["voice_server"]["wake_word"] matches "Hey Tiny" (use
as_str()/unwrap_or("") or equivalent). Ensure these checks follow the existing
min_duration_secs assertion and run before removing the OPENHUMAN_WORKSPACE env
var.
🟡 Minor comments (8)
app/src/lib/i18n/fr.ts-1459-1461 (1)

1459-1461: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep French pronoun register consistent in this new string.

Line 1461 mixes informal and formal forms (Garde ... vous dites). Please keep a single register (this file mostly uses informal tu).

✏️ Suggested wording
-  'voice.debug.alwaysOnDesc':
-    'Garde le microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé.',
+  'voice.debug.alwaysOnDesc':
+    'Garde le microphone ouvert et envoie automatiquement ce que tu dis à l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé.',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/lib/i18n/fr.ts` around lines 1459 - 1461, The French string for the
key 'voice.debug.alwaysOnDesc' mixes formal and informal pronouns; update the
value to a consistent informal (tu) register by replacing the sentence "Garde le
microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans
raccourci. Se met en pause lorsque l’écran est verrouillé." with an informal
variant such as "Garde le micro ouvert et envoie automatiquement ce que tu dis à
l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé." to
match the rest of the file.
app/src/components/settings/panels/__tests__/VoicePanel.test.tsx-539-553 (1)

539-553: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an assertion that the failing toggle path actually attempted the update RPC.

Right now this test can pass even if the click handler is broken, since aria-checked="false" and “no notch sync calls” are both true before the click.

✅ Tighten the failure-path assertion
   fireEvent.click(toggle);

-  // The optimistic flip is reverted back to off after the RPC rejects, and
-  // the notch is never shown because the persist failed.
+  await waitFor(() =>
+    expect(vi.mocked(openhumanUpdateVoiceServerSettings)).toHaveBeenCalledWith({
+      always_on_enabled: true,
+    })
+  );
+
+  // The toggle ends in off after rejection, and notch is never synced.
   await waitFor(() => expect(toggle).toHaveAttribute('aria-checked', 'false'));
   expect(vi.mocked(syncNotchVisibility)).not.toHaveBeenCalled();

As per coding guidelines: "Prefer testing behavior over implementation in unit tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx` around
lines 539 - 553, The test currently never asserts that the click actually
triggered the RPC; add an assertion that the mocked update RPC
(openhumanUpdateVoiceServerSettings) was invoked when the toggle was clicked —
e.g. after the click and after awaiting the revert, assert
vi.mocked(openhumanUpdateVoiceServerSettings).toHaveBeenCalled() or
.toHaveBeenCalledTimes(1) (and optionally with the expected payload) so the
failing-path truly exercised the RPC call in VoicePanel and you still assert
syncNotchVisibility was not called.
src/openhuman/config/schemas.rs-1731-1736 (1)

1731-1736: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently swallow live refresh failures after updating voice settings.

If config reload fails here, the RPC still succeeds but always-on runtime state may remain stale until restart. Please log (or return) the failure path.

Proposed fix
-        if let Ok(config) = config_rpc::load_config_with_timeout().await {
-            crate::openhuman::voice::always_on::start_if_enabled(&config).await;
-        }
+        match config_rpc::load_config_with_timeout().await {
+            Ok(config) => crate::openhuman::voice::always_on::start_if_enabled(&config).await,
+            Err(err) => {
+                log::warn!(
+                    "[config][rpc] update_voice_server_settings live refresh skipped: {err}"
+                );
+            }
+        }

As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/schemas.rs` around lines 1731 - 1736, The code calls
config_rpc::load_config_with_timeout().await and silently ignores failures,
which can leave always-on runtime stale; modify the block after
config_rpc::load_and_apply_voice_server_settings(patch).await to handle the Err
case: if load_config_with_timeout() returns Ok call
crate::openhuman::voice::always_on::start_if_enabled(&config).await, otherwise
log a stable, grep-friendly error (including the error details and context like
"voice-settings: reload failed") via the existing logger or return the error so
the caller can see the failure; ensure the log message includes the function
names or context to aid debugging.
src/openhuman/voice/command_router.rs-59-72 (1)

59-72: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

strip_filler claims to handle “hey” but currently does not.

The docstring says “hey” is stripped, but it isn’t in fillers, so commands like “hey pause” route to Unknown.

Proposed fix
 fn strip_filler(s: &str) -> &str {
     let fillers = [
+        "hey ",
         "please ",
         "can you please ",
         "can you ",
         "could you please ",
         "could you ",
         "would you ",
         "i want to ",
         "i want you to ",
         "go ahead and ",
     ];
     fn transport_controls() {
+        assert_eq!(route("hey pause"), VoiceIntent::Pause);
         assert_eq!(route("pause"), VoiceIntent::Pause);
         assert_eq!(route("Pause the music."), VoiceIntent::Pause);
         assert_eq!(route("please stop"), VoiceIntent::Pause);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/command_router.rs` around lines 59 - 72, The docstring
for strip_filler says it removes "hey" but the fillers array in strip_filler
does not include "hey ", so inputs like "hey pause" aren't stripped; update the
fillers list inside the strip_filler function to include "hey " (and any other
expected filler variants you want handled) so the function actually strips that
prefix when matching commands.
docs/voice-automate-plan.md-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix fenced-code metadata and remove the trailing stray fence.

Line 20 opens an unlabeled code fence, and Line 152 appears to introduce an extra fence. This will keep markdownlint warning and can break rendering in some viewers.

Proposed patch
-```
+```text
 orchestrator (chat LLM)
        │  one call: automate{ app, goal }
        ▼
@@
-- **Safety** — `automate` is a mutating tool: same opt-in + `SENSITIVE_APPS`
-  denylist + ApprovalGate routing as `ax_interact`; the inner loop may not target a
-  denylisted app even if the model asks.
-```
+- **Safety** — `automate` is a mutating tool: same opt-in + `SENSITIVE_APPS`
+  denylist + ApprovalGate routing as `ax_interact`; the inner loop may not target a
+  denylisted app even if the model asks.

Also applies to: 152-152

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/voice-automate-plan.md` at line 20, The file contains an unlabeled
opening code fence and a stray closing fence that breaks markdown rendering;
update the opening fence to include a language label (e.g., change the opening
``` to ```text) and remove the extra trailing ``` after the Safety paragraph so
the fenced block around the orchestrator / Safety section is properly opened and
closed; locate the unlabeled fence near the start of the orchestration snippet
and the stray fence after the "denylisted app even if the model asks." line and
make the label/removal changes there.
docs/voice-system-actions.md-365-365 (1)

365-365: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This fence is missing a language identifier and can trigger markdown lint failures.

Suggested fix
-```
+```text
 enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty
 → dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP
-```
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/voice-system-actions.md` at line 365, The fenced code block containing
the trace lines (enigo::macos::get_layoutdependent_keycode →
TSMGetInputSourceProperty → dispatch_assert_queue → _dispatch_assert_queue_fail
→ SIGTRAP) is missing a language tag; update the fence to include a language
identifier (for example change ``` to ```text) so the block becomes ```text ...
```; locate the block by the presence of those unique symbols and add the
language tag only, leaving the content unchanged.
docs/voice-system-actions.md-329-333 (1)

329-333: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve conflicting rollout/status labels in the same tracker.

These changed sections currently conflict (for example: “Not Started” vs “M1+M2+M3 shipped/proven live”, and “crash fixed” vs “fix required / keep disabled”). Please normalize to one authoritative state so the tracker is operationally reliable.

Based on learnings: New or changed behavior must ship with matching documentation—at minimum add concise rustdoc / code comments where flow is not obvious, and update AGENTS.md, architecture docs, or feature docs when repository rules or user-visible behavior change.

Also applies to: 352-372, 637-644

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/voice-system-actions.md` around lines 329 - 333, The tracker sections
for "Phase 1.5 — Reliable, real-time multi-step automation" contain conflicting
rollout/status labels (e.g., "Not Started" vs "M1+M2+M3 shipped/proven live" and
"crash fixed" vs "fix required / keep disabled"); normalize each affected block
(including the referenced spans ~329-333, 352-372, 637-644) to a single
authoritative status string and a single canonical rollout note, update the
human-facing lines under the phase header and any bullet status lines to match
that single state, and add a short note pointing to implementation docs; also
add matching inline rustdoc/code comments where flow is non-obvious in the
implementation (reference the "Rust inner loop" and "voice-automate-plan.md")
and update AGENTS.md and relevant architecture/feature docs so repository rules
and user-visible behavior reflect the chosen state.
src/openhuman/accessibility/automate.rs-203-206 (1)

203-206: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Extract the first balanced JSON object, not first { to last }.

Lines 203-206 currently slice from the first { to the last }, which can fail when output contains extra brace blocks after a valid object. This contradicts the intended “first balanced block” behavior and causes avoidable parse failures.

🔧 Suggested parser fix
-    // Extract the first {...} span and retry.
-    if let (Some(start), Some(end)) = (trimmed.find('{'), trimmed.rfind('}')) {
-        if end > start {
-            if let Ok(a) = serde_json::from_str::<Action>(&trimmed[start..=end]) {
-                return Ok(a);
-            }
-        }
-    }
+    // Extract the first balanced {...} block and retry.
+    if let Some(start) = trimmed.find('{') {
+        let mut depth = 0usize;
+        for (idx, ch) in trimmed[start..].char_indices() {
+            match ch {
+                '{' => depth += 1,
+                '}' => {
+                    depth -= 1;
+                    if depth == 0 {
+                        let end = start + idx;
+                        if let Ok(a) = serde_json::from_str::<Action>(&trimmed[start..=end]) {
+                            return Ok(a);
+                        }
+                        break;
+                    }
+                }
+                _ => {}
+            }
+        }
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/automate.rs` around lines 203 - 206, The current
logic slices trimmed from the first '{' to the last '}', which can include extra
trailing brace blocks and fail parsing; change this to locate the first balanced
JSON object by scanning trimmed from the first '{', tracking brace depth
(increment on '{', decrement on '}') until depth returns to zero, taking that
slice as the candidate JSON, then call
serde_json::from_str::<Action>(&trimmed[start..=end]) on that balanced slice
(ensure the scanner handles braces inside strings/escaped quotes or at least
stops at matching braces for typical outputs); update the block around the
trimmed.find/rfind usage to use this balanced-scan approach instead of using
rfind('}').
🧹 Nitpick comments (6)
app/src-tauri/permissions/allow-core-process.toml (1)

65-70: 🏗️ Heavy lift

Split notch window commands into a dedicated permission scope.

Adding notch_window_show / notch_window_hide to allow-core-process further broadens a privileged bundle. Please move these into a dedicated permission TOML and grant it only to the notch-related UI surface(s).

Based on learnings: "When reviewing Tauri permission changes, avoid keeping unrelated permissions bundled into a broad identifier ... split them into dedicated, narrower permission TOML identifiers and update capability grants."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/permissions/allow-core-process.toml` around lines 65 - 70, The
allow-core-process TOML currently lists the privileged commands
"notch_window_show" and "notch_window_hide"; remove those two entries from
allow-core-process and create a new, dedicated permission TOML (e.g.,
notch-window-permissions) that contains "notch_window_show" and
"notch_window_hide", then update the capability grants so only the notch-related
UI surfaces/components are given that new permission identifier instead of the
broad allow-core-process. Ensure any references to these commands (invokes)
still resolve by updating the permission identifier in the relevant UI surface
capability configurations.
app/src/features/human/voice/useVoiceSpeak.test.ts (1)

65-69: ⚡ Quick win

Exercise cleanup after a real voice:speak event.

This only verifies socketService.off(...). It never starts playback, so it won't fail if unmount forgets to call the returned stop() handle and leaves audio running.

🧪 Suggested test adjustment
-it('unsubscribes and stops playback on unmount', () => {
+it('unsubscribes and stops playback on unmount', async () => {
   const { unmount } = renderHook(() => useVoiceSpeak());
+  speakHandler()({ text: 'Post to Slack. Say yes to confirm.', source: 'approval' });
+  await vi.waitFor(() => expect(hoisted.playMock).toHaveBeenCalledTimes(1));
   unmount();
   expect(hoisted.offMock).toHaveBeenCalledWith('voice:speak', expect.any(Function));
+  expect(hoisted.stopMock).toHaveBeenCalledTimes(1);
 });

Based on learnings: Prefer testing behavior over implementation details in unit tests. Use existing helpers from app/src/test/ before adding new harness code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/features/human/voice/useVoiceSpeak.test.ts` around lines 65 - 69, The
test only asserts that socketService.off was called but never triggers a real
"voice:speak" playback, so update the test in useVoiceSpeak.test.ts to simulate
a full lifecycle: use the same socket test helper (hoisted.emitMock or the test
helper in app/src/test/) to emit a 'voice:speak' event after rendering
useVoiceSpeak (so playback actually starts), capture or spy the playback stop
handle returned by the playback/start path (e.g., mock the audio/playback
object's stop method or capture the stop function returned by the handler), then
call unmount() and assert both that hoisted.offMock was called with
'voice:speak' and that the captured stop handle was invoked; reference
useVoiceSpeak, hoisted.emitMock (or the project's socket emit helper),
hoisted.offMock, and the playback stop mock when locating where to change the
test.
src/openhuman/voice/audio_capture.rs (1)

105-129: ⚡ Quick win

Route the one-shot finalize path through this helper too.

The new helper says it is shared by the recorder finalize path, but finalize_recording() still has its own WAV encoding block below. That leaves two PCM/WAV implementations to keep in sync and weakens the reuse this change is trying to establish.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/audio_capture.rs` around lines 105 - 129, The
finalize_recording path still contains an inline WAV encoding loop duplicated
from encode_wav_16k; replace that block in finalize_recording() with a call to
encode_wav_16k(samples_16k_slice) (passing the prepared 16k mono f32 slice) and
use the returned Vec<u8> as the WAV bytes, removing the duplicated
WavWriter/loop/finalize code; make sure to propagate and map errors to the same
Result<String> error style as encode_wav_16k, and keep sample-rate/resampling
logic intact so finalize_recording supplies a proper &[f32] to encode_wav_16k.
app/src/notch/NotchApp.tsx (1)

183-207: ⚡ Quick win

Use namespaced debug logging instead of console.* in app/src.

This file currently emits diagnostics via console.debug/console.warn; switch to the repo’s namespaced debug logger pattern with dev-only detail.

As per coding guidelines: "Use namespaced debug function with dev-only detail for diagnostic logging in TypeScript/React".

Also applies to: 234-255

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/notch/NotchApp.tsx` around lines 183 - 207, Replace
console.debug/console.warn calls inside the socket handlers (e.g., the handlers
registered for 'dictation:toggle', 'dictation:transcription', and
'companion:state_changed' in NotchApp/NotchApp.tsx) with the project’s
namespaced debug logger (use the existing debug utility used elsewhere in
app/src, e.g., debug('notch:...') or the repo’s debug import) so logs are
namespaced and can be stripped in production; import the debug function if not
present and convert messages like console.debug(`[notch] ...`) to debug('notch',
...) (or the repo convention) and ensure any verbose/detail-only messages are
guarded as dev-only per the repo pattern (the other console.* occurrences around
lines ~234-255 should be changed the same way).
app/src/pages/Conversations.tsx (1)

793-793: ⚡ Quick win

Preserve optional wire behavior for voice on non-voice turns.

This currently forces voice: false for every send. Passing through opts?.voice keeps the field truly optional and avoids sending it when not explicitly voice-initiated.

Suggested diff
-        voice: opts?.voice ?? false,
+        voice: opts?.voice,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/pages/Conversations.tsx` at line 793, The current payload always sets
the voice field to false by using "voice: opts?.voice ?? false"; change this to
preserve optional behavior by sending "voice: opts?.voice" instead so the field
is omitted when undefined (locate the object construction using opts?.voice ??
false in Conversations.tsx, likely inside the send/submit function that builds
the message payload). Ensure no other code relies on voice being explicitly
boolean false when omitted.
src/openhuman/voice/speak_bus.rs (1)

47-55: 🏗️ Heavy lift

Route speech events through the typed global event bus instead of a module-local broadcast.

This path is now cross-module (voicecore/socketio), but it bypasses the repository’s standard DomainEvent pub/sub surface. Please move this to event_bus::publish_global / subscribe_global with a dedicated DomainEvent variant so registration, tracing, and fanout behavior stay consistent with other domain events.

As per coding guidelines: src/**/*.rs should use the event bus for decoupled cross-module communication via publish_global / subscribe_global.

Also applies to: 62-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/speak_bus.rs` around lines 47 - 55, Replace the
module-local broadcast (SPEAK_BUS) and subscribe_speak_events() to use the
repository event bus: add a new DomainEvent variant (e.g.,
DomainEvent::SpeakRequest(SpeakRequest)) and publish speak events via
event_bus::publish_global(DomainEvent::SpeakRequest(...)) where currently
sending to SPEAK_BUS; implement subscribe_speak_events() to call
event_bus::subscribe_global() and filter/unwrap DomainEvent::SpeakRequest into a
broadcast::Receiver/stream of SpeakRequest so consumers (Socket.IO bridge)
receive typed events through the global bus rather than the SPEAK_BUS static.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/notch_window.rs`:
- Around line 362-366: The string interpolation in the format! that builds js in
notch_window.rs (the js variable where token and base_url are inserted) is
unsafe: escape token and base_url for inclusion in a single-quoted JS string (or
serialize them as JS string literals) before formatting to prevent quotes,
backslashes or newlines from breaking or injecting code; add a small helper
(e.g., js_string_escape or use serde_json::to_string to produce a quoted JS
string and strip surrounding quotes) and use that helper to produce safe values
for token and base_url when constructing the js variable and dispatching the
CustomEvent.

In `@app/src/App.tsx`:
- Around line 205-214: The notch sync one-shot guard is set before the async
work, causing failures to permanently skip retries; modify the IIFE so
notchSyncedRef.current is assigned true only after a successful sync: call
openhumanGetVoiceServerSettings() and await syncNotchVisibility(...) inside the
try block, and on success set notchSyncedRef.current = true (keep the early
return check for isBootstrapping || notchSyncedRef.current at the top) so
transient errors won’t flip the guard; reference functions/vars:
isBootstrapping, notchSyncedRef, openhumanGetVoiceServerSettings,
syncNotchVisibility.

In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 509-525: The toggle handler mixes persistence and notch-sync and
is vulnerable to re-entrant clicks; change it so clicking is ignored while a
pendingToggle flag (or similar) is true, update the UI optimistically and
persist with openhumanUpdateVoiceServerSettings, and only revert the UI if
persistence fails—do not revert when syncNotchVisibility fails; instead handle
sync errors separately (log/show feedback) and optionally retry without changing
settings. Use the existing symbols settings.always_on_enabled, setSettings,
openhumanUpdateVoiceServerSettings, and syncNotchVisibility and add a local
pending toggle guard to prevent concurrent toggles.

In `@app/src/features/human/voice/useVoiceSpeak.ts`:
- Around line 50-60: The synth/ playback race happens because older
synthesizeSpeech calls can resolve later and call handleRef.current?.stop(),
killing newer audio; fix by introducing a monotonic request id (e.g.,
requestIdRef) and capture const myRequestId = ++requestIdRef.current at the
start of the async block, then after synthesizeSpeech resolves verify
myRequestId === requestIdRef.current before stopping the current handle or
assigning handleRef.current = handle (and similarly before calling
handle.ended.catch). This ensures only the latest voice:speak invocation can
stop or replace the shared handle; reference synthesizeSpeech, handleRef,
playBase64Audio, and handle.ended in your change.

In `@app/src/notch/NotchApp.tsx`:
- Around line 164-261: connectSocket can initialize a socket after the component
unmounts because the returned disposable isn't invoked; ensure teardown is
actually used and prevent post-unmount assignment/leaks by (1) keeping the
disposed flag check after await connectCoreSocket(...) and if disposed is true
then immediately disconnect the newly created socket and return, (2) register
socket listeners on the socket instance only when not disposed, and (3) ensure
the caller effect that invokes connectSocket actually captures and calls its
returned cleanup function (i.e., call the function returned by connectSocket
inside the effect's cleanup). Refer to connectSocket, connectCoreSocket,
socketRef, and the local disposed flag when making these changes.

In `@app/src/utils/tauriCommands/window.ts`:
- Around line 122-123: Replace the raw console.warn(err) calls in
app/src/utils/tauriCommands/window.ts with a namespaced debug logger and
redacted error text: import and instantiate a debug logger (e.g. const debug =
createDebug('app:tauri:window') or using the repo's existing debug factory),
then change the console.warn in the function(s) that show the window (and the
other occurrences around lines 136-137) to debug(...) while only logging a
sanitized message and non-sensitive fields (e.g. debug('show failed:',
safeErrorString(err)) where safeErrorString returns err.message or a redacted
summary, not the full err object); ensure you add the debug
import/initialization and replace all similar console.warn/raw err logs in this
module to follow the same pattern.

In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Line 237: The current log call logs raw user query via
log::info!("[automate::music] ▶ play query={query:?}"); — replace that to avoid
PII by computing and logging only non-sensitive metadata: compute query_len =
query.len() and query_hash = sha256_hex(query) (or another stable hash) and
update the log to something like log::info!("[automate::music] ▶ play
query_len={} query_hash={}", query_len, query_hash); ensure the original `query`
string is not included anywhere in logs and reference the same variable name
`query` and the logging site (log::info! call) when making the change.
- Around line 150-166: The replace_ci function is using byte-indexing into the
lowercased string (hl[i..]) while i is tracked over the original haystack's
bytes, which is Unicode-unsafe; change replace_ci to iterate using char_indices
(or parallel char_indices over haystack and hl) and track byte offsets for both
original and lowercased forms so you only slice at valid UTF-8 boundaries (e.g.,
use for (orig_byte_idx, ch) in haystack.char_indices() and maintain a
corresponding hl_byte_idx into hl to call hl[hl_byte_idx..].starts_with(&nl) and
advance both indices by the matched
needle.chars().map(|c|c.len_utf8()).sum::<usize>() or by the matched byte
lengths); and in run, stop logging the full user-derived query via
log::info!("[automate::music] ▶ play query={query:?}"), instead log a redacted
or masked version (e.g., omit the query, log only length or a fixed placeholder)
to avoid exposing PII/secrets.

In `@src/openhuman/accessibility/automate.rs`:
- Around line 210-212: The logs currently emit raw user/model content (e.g.,
variables goal, label, filter and trimmed/model output) which may contain PII;
update the logging and error creation to redact or omit sensitive content
instead of printing the full values — replace direct interpolations like
trimmed, goal, label, filter with a sanitized/redacted representation (e.g.,
call a helper redact_sensitive(s: &str) or replace with "[REDACTED]") in the
Err(format!(...)) and all log calls that reference those symbols so logs never
contain full user/model text.

In `@src/openhuman/credentials/ops.rs`:
- Around line 44-47: The always-on voice service started by
crate::openhuman::voice::always_on::start_if_enabled(config) isn't torn down on
logout; add a symmetric shutdown path by implementing a public stop() in
src/openhuman/voice/always_on.rs that clears ENABLED (Ordering::SeqCst) and
performs full stream/thread teardown, then call
crate::openhuman::voice::always_on::stop() from stop_login_gated_services so the
always-on capture is fully disabled on logout/user switch.

In `@src/openhuman/tools/impl/browser/screenshot.rs`:
- Around line 209-225: The helper downscale_to_jpeg currently returns the last
oversized candidate stored in last when no thumbnail fits max_bytes; change this
so that if the loop finishes without finding out.len() <= max_bytes it returns
an Err (e.g., via last.ok_or_else -> Err) instead of returning Ok(last),
preserving the function's contract; update the tail of downscale_to_jpeg
(referencing variables last and max_bytes and the function name) to produce an
error when no size fits rather than returning the oversized JPEG.

In `@src/openhuman/tools/impl/computer/automate.rs`:
- Around line 1-16: The AutomateTool implementation belongs in the domain's
owning module not in the impl tree: move the AutomateTool type, its Tool impl,
and any helper functions/uses (e.g., AutomateOptions, RealBackend,
is_sensitive_app) out of the legacy impl file and into the owning domain's
tools.rs (or a tools/ submodule) for the computer/openhuman domain, update the
module imports there, and then re-export AutomateTool from the central tools mod
(src/openhuman/tools/mod.rs) so external callers use the normal tool surface;
also remove the old file or its module entry from the impl/ folder and fix any
use sites to import the re-exported AutomateTool.
- Line 105: The current log statement uses log::info! and includes the raw
free-form variable goal which may contain sensitive user data; change the log to
avoid printing goal directly and instead record only derived/redacted metadata
(for example keep app as-is and replace goal with goal_len and/or a short hash),
e.g. compute a length (goal.len()) and a stable hash of goal (SHA-256 or
similar) and log those fields instead of the full goal string so
log::info!("[automate] ▶ tool execute app={app:?} goal_len={goal_len}
goal_hash={goal_hash:?}");

In `@src/openhuman/voice/always_on.rs`:
- Around line 229-230: The always-on processor currently captures a static
AppConfig via app_config.clone and builds VadConfig with
VadConfig::from_server_config at startup, causing updates (wake word, STT
provider, VAD tuning) to be ignored while RUNNING is true; change this to read a
shared, mutable runtime settings object instead of cloning at startup: introduce
or use an existing shared RwLock/atomic voice settings struct (e.g.,
get_runtime_voice_settings()) and replace direct uses of app_config /
VadConfig::from_server_config in the long‑lived loop and at each decision point
(locations around where app_config.clone is used and the VadConfig creation) so
you fetch updated settings per utterance/decision and rebuild or reference the
latest VadConfig/STT/wake-word provider on demand.
- Around line 365-375: The logs currently print full spoken content via
log::info! including cmd and text (see the log lines around LOG_PREFIX and the
deliver_command call), which may expose PII/secrets; update those log statements
to redact or mask sensitive content instead of printing cmd or text (e.g.,
replace with a redacted placeholder, truncated summary, or a hashed token) and
only log safe metadata (like whether a wake word matched, confidence flags, or a
masked length). Apply the change to both the wake-word-matched branch (before
calling deliver_command) and the None branch that logs transcript, or factor out
a helper (e.g., redact_transcript or mask_sensitive) to centralize redaction so
future logs use the safe representation. Ensure deliver_command invocation
remains unchanged except for not logging cmd contents directly in the
surrounding info messages.

---

Outside diff comments:
In `@src/openhuman/config/ops_tests.rs`:
- Around line 1031-1061: The test
load_and_apply_voice_server_settings_accepts_valid_modes_and_clamps sets
always_on_enabled and wake_word but never asserts they persisted; update the
assertions after calling load_and_apply_voice_server_settings (using the outcome
variable) to check outcome.value["config"]["voice_server"]["always_on_enabled"]
is true (use as_bool()/unwrap_or(false)) and
outcome.value["config"]["voice_server"]["wake_word"] matches "Hey Tiny" (use
as_str()/unwrap_or("") or equivalent). Ensure these checks follow the existing
min_duration_secs assertion and run before removing the OPENHUMAN_WORKSPACE env
var.

---

Minor comments:
In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`:
- Around line 539-553: The test currently never asserts that the click actually
triggered the RPC; add an assertion that the mocked update RPC
(openhumanUpdateVoiceServerSettings) was invoked when the toggle was clicked —
e.g. after the click and after awaiting the revert, assert
vi.mocked(openhumanUpdateVoiceServerSettings).toHaveBeenCalled() or
.toHaveBeenCalledTimes(1) (and optionally with the expected payload) so the
failing-path truly exercised the RPC call in VoicePanel and you still assert
syncNotchVisibility was not called.

In `@app/src/lib/i18n/fr.ts`:
- Around line 1459-1461: The French string for the key
'voice.debug.alwaysOnDesc' mixes formal and informal pronouns; update the value
to a consistent informal (tu) register by replacing the sentence "Garde le
microphone ouvert et envoie automatiquement ce que vous dites à l’agent, sans
raccourci. Se met en pause lorsque l’écran est verrouillé." with an informal
variant such as "Garde le micro ouvert et envoie automatiquement ce que tu dis à
l’agent, sans raccourci. Se met en pause lorsque l’écran est verrouillé." to
match the rest of the file.

In `@docs/voice-automate-plan.md`:
- Line 20: The file contains an unlabeled opening code fence and a stray closing
fence that breaks markdown rendering; update the opening fence to include a
language label (e.g., change the opening ``` to ```text) and remove the extra
trailing ``` after the Safety paragraph so the fenced block around the
orchestrator / Safety section is properly opened and closed; locate the
unlabeled fence near the start of the orchestration snippet and the stray fence
after the "denylisted app even if the model asks." line and make the
label/removal changes there.

In `@docs/voice-system-actions.md`:
- Line 365: The fenced code block containing the trace lines
(enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty →
dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP) is missing a
language tag; update the fence to include a language identifier (for example
change ``` to ```text) so the block becomes ```text ... ```; locate the block by
the presence of those unique symbols and add the language tag only, leaving the
content unchanged.
- Around line 329-333: The tracker sections for "Phase 1.5 — Reliable, real-time
multi-step automation" contain conflicting rollout/status labels (e.g., "Not
Started" vs "M1+M2+M3 shipped/proven live" and "crash fixed" vs "fix required /
keep disabled"); normalize each affected block (including the referenced spans
~329-333, 352-372, 637-644) to a single authoritative status string and a single
canonical rollout note, update the human-facing lines under the phase header and
any bullet status lines to match that single state, and add a short note
pointing to implementation docs; also add matching inline rustdoc/code comments
where flow is non-obvious in the implementation (reference the "Rust inner loop"
and "voice-automate-plan.md") and update AGENTS.md and relevant
architecture/feature docs so repository rules and user-visible behavior reflect
the chosen state.

In `@src/openhuman/accessibility/automate.rs`:
- Around line 203-206: The current logic slices trimmed from the first '{' to
the last '}', which can include extra trailing brace blocks and fail parsing;
change this to locate the first balanced JSON object by scanning trimmed from
the first '{', tracking brace depth (increment on '{', decrement on '}') until
depth returns to zero, taking that slice as the candidate JSON, then call
serde_json::from_str::<Action>(&trimmed[start..=end]) on that balanced slice
(ensure the scanner handles braces inside strings/escaped quotes or at least
stops at matching braces for typical outputs); update the block around the
trimmed.find/rfind usage to use this balanced-scan approach instead of using
rfind('}').

In `@src/openhuman/config/schemas.rs`:
- Around line 1731-1736: The code calls
config_rpc::load_config_with_timeout().await and silently ignores failures,
which can leave always-on runtime stale; modify the block after
config_rpc::load_and_apply_voice_server_settings(patch).await to handle the Err
case: if load_config_with_timeout() returns Ok call
crate::openhuman::voice::always_on::start_if_enabled(&config).await, otherwise
log a stable, grep-friendly error (including the error details and context like
"voice-settings: reload failed") via the existing logger or return the error so
the caller can see the failure; ensure the log message includes the function
names or context to aid debugging.

In `@src/openhuman/voice/command_router.rs`:
- Around line 59-72: The docstring for strip_filler says it removes "hey" but
the fillers array in strip_filler does not include "hey ", so inputs like "hey
pause" aren't stripped; update the fillers list inside the strip_filler function
to include "hey " (and any other expected filler variants you want handled) so
the function actually strips that prefix when matching commands.

---

Nitpick comments:
In `@app/src-tauri/permissions/allow-core-process.toml`:
- Around line 65-70: The allow-core-process TOML currently lists the privileged
commands "notch_window_show" and "notch_window_hide"; remove those two entries
from allow-core-process and create a new, dedicated permission TOML (e.g.,
notch-window-permissions) that contains "notch_window_show" and
"notch_window_hide", then update the capability grants so only the notch-related
UI surfaces/components are given that new permission identifier instead of the
broad allow-core-process. Ensure any references to these commands (invokes)
still resolve by updating the permission identifier in the relevant UI surface
capability configurations.

In `@app/src/features/human/voice/useVoiceSpeak.test.ts`:
- Around line 65-69: The test only asserts that socketService.off was called but
never triggers a real "voice:speak" playback, so update the test in
useVoiceSpeak.test.ts to simulate a full lifecycle: use the same socket test
helper (hoisted.emitMock or the test helper in app/src/test/) to emit a
'voice:speak' event after rendering useVoiceSpeak (so playback actually starts),
capture or spy the playback stop handle returned by the playback/start path
(e.g., mock the audio/playback object's stop method or capture the stop function
returned by the handler), then call unmount() and assert both that
hoisted.offMock was called with 'voice:speak' and that the captured stop handle
was invoked; reference useVoiceSpeak, hoisted.emitMock (or the project's socket
emit helper), hoisted.offMock, and the playback stop mock when locating where to
change the test.

In `@app/src/notch/NotchApp.tsx`:
- Around line 183-207: Replace console.debug/console.warn calls inside the
socket handlers (e.g., the handlers registered for 'dictation:toggle',
'dictation:transcription', and 'companion:state_changed' in
NotchApp/NotchApp.tsx) with the project’s namespaced debug logger (use the
existing debug utility used elsewhere in app/src, e.g., debug('notch:...') or
the repo’s debug import) so logs are namespaced and can be stripped in
production; import the debug function if not present and convert messages like
console.debug(`[notch] ...`) to debug('notch', ...) (or the repo convention) and
ensure any verbose/detail-only messages are guarded as dev-only per the repo
pattern (the other console.* occurrences around lines ~234-255 should be changed
the same way).

In `@app/src/pages/Conversations.tsx`:
- Line 793: The current payload always sets the voice field to false by using
"voice: opts?.voice ?? false"; change this to preserve optional behavior by
sending "voice: opts?.voice" instead so the field is omitted when undefined
(locate the object construction using opts?.voice ?? false in Conversations.tsx,
likely inside the send/submit function that builds the message payload). Ensure
no other code relies on voice being explicitly boolean false when omitted.

In `@src/openhuman/voice/audio_capture.rs`:
- Around line 105-129: The finalize_recording path still contains an inline WAV
encoding loop duplicated from encode_wav_16k; replace that block in
finalize_recording() with a call to encode_wav_16k(samples_16k_slice) (passing
the prepared 16k mono f32 slice) and use the returned Vec<u8> as the WAV bytes,
removing the duplicated WavWriter/loop/finalize code; make sure to propagate and
map errors to the same Result<String> error style as encode_wav_16k, and keep
sample-rate/resampling logic intact so finalize_recording supplies a proper
&[f32] to encode_wav_16k.

In `@src/openhuman/voice/speak_bus.rs`:
- Around line 47-55: Replace the module-local broadcast (SPEAK_BUS) and
subscribe_speak_events() to use the repository event bus: add a new DomainEvent
variant (e.g., DomainEvent::SpeakRequest(SpeakRequest)) and publish speak events
via event_bus::publish_global(DomainEvent::SpeakRequest(...)) where currently
sending to SPEAK_BUS; implement subscribe_speak_events() to call
event_bus::subscribe_global() and filter/unwrap DomainEvent::SpeakRequest into a
broadcast::Receiver/stream of SpeakRequest so consumers (Socket.IO bridge)
receive typed events through the global bus rather than the SPEAK_BUS static.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d69e6a4-a381-469d-9aa2-969089cddf36

📥 Commits

Reviewing files that changed from the base of the PR and between 7c08704 and c00ceec.

📒 Files selected for processing (97)
  • app/src-tauri/permissions/allow-core-process.toml
  • app/src-tauri/src/lib.rs
  • app/src-tauri/src/notch_window.rs
  • app/src/App.tsx
  • app/src/components/settings/panels/VoiceDebugPanel.tsx
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
  • app/src/features/human/voice/useVoiceSpeak.test.ts
  • app/src/features/human/voice/useVoiceSpeak.ts
  • app/src/index.css
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/main.tsx
  • app/src/notch/NotchApp.tsx
  • app/src/pages/Conversations.tsx
  • app/src/services/chatService.ts
  • app/src/services/coreRpcClient.ts
  • app/src/utils/tauriCommands/voice.ts
  • app/src/utils/tauriCommands/window.test.ts
  • app/src/utils/tauriCommands/window.ts
  • app/src/utils/toolDefinitions.ts
  • docs/voice-automate-plan.md
  • docs/voice-system-actions.md
  • gitbooks/developing/architecture/tauri-shell.md
  • src/core/event_bus/events.rs
  • src/core/event_bus/events_tests.rs
  • src/core/jsonrpc.rs
  • src/core/socketio.rs
  • src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs
  • src/openhuman/accessibility/app_fastpaths/mod.rs
  • src/openhuman/accessibility/app_fastpaths/music.rs
  • src/openhuman/accessibility/automate.rs
  • src/openhuman/accessibility/automate_tests.rs
  • src/openhuman/accessibility/ax_interact.rs
  • src/openhuman/accessibility/ax_interact_tests.rs
  • src/openhuman/accessibility/helper.rs
  • src/openhuman/accessibility/mod.rs
  • src/openhuman/accessibility/uia_interact.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs
  • src/openhuman/agent_registry/agents/orchestrator/agent.toml
  • src/openhuman/agent_registry/agents/orchestrator/prompt.md
  • src/openhuman/approval/gate.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/providers/telegram/approval_surface_tests.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/config/ops.rs
  • src/openhuman/config/ops_tests.rs
  • src/openhuman/config/schema/voice_server.rs
  • src/openhuman/config/schemas.rs
  • src/openhuman/credentials/ops.rs
  • src/openhuman/tools/impl/browser/screenshot.rs
  • src/openhuman/tools/impl/computer/automate.rs
  • src/openhuman/tools/impl/computer/ax_interact.rs
  • src/openhuman/tools/impl/computer/keyboard.rs
  • src/openhuman/tools/impl/computer/main_thread.rs
  • src/openhuman/tools/impl/computer/mod.rs
  • src/openhuman/tools/impl/computer/mouse.rs
  • src/openhuman/tools/impl/system/install_tool.rs
  • src/openhuman/tools/impl/system/launch_app.rs
  • src/openhuman/tools/impl/system/mod.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/tools/user_filter.rs
  • src/openhuman/voice/always_on.rs
  • src/openhuman/voice/approval_surface.rs
  • src/openhuman/voice/approval_surface_tests.rs
  • src/openhuman/voice/audio_capture.rs
  • src/openhuman/voice/command_router.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/speak_bus.rs
  • src/openhuman/wallet/execution_tests.rs
  • tests/channels_large_round25_raw_coverage_e2e.rs
  • tests/channels_provider_deep_raw_coverage_e2e.rs
  • tests/channels_provider_leftovers_raw_coverage_e2e.rs
  • tests/channels_runtime_raw_coverage_e2e.rs
  • tests/channels_web_startup_raw_coverage_e2e.rs
  • tests/channels_web_telegram_raw_coverage_e2e.rs
  • tests/channels_web_yuanbao_round22_raw_coverage_e2e.rs
  • tests/json_rpc_e2e.rs
  • tests/tool_registry_approval_raw_coverage_e2e.rs
  • tests/tools_approval_channels_raw_coverage_e2e.rs
  • tests/tools_network_channels_raw_coverage_e2e.rs
  • tests/worker_b_raw_coverage_e2e.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

🛑 Comments failed to post (15)
app/src-tauri/src/notch_window.rs (1)

362-366: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape injected JS values before writing token/URL into the WebView.

token and base_url are interpolated directly into JS string literals. If either contains quote/backslash/newline characters, injection can fail or execute unintended JS.

Proposed fix
-        let js = format!(
-            "window.__OPENHUMAN_NOTCH_CORE_TOKEN__='{token}';\
-             window.__OPENHUMAN_NOTCH_CORE_URL__='{base_url}';\
-             window.dispatchEvent(new CustomEvent('notch:core-url',{{detail:{{url:'{base_url}'}}}}));"
-        );
+        let token_js = serde_json::to_string(&token).unwrap_or_else(|_| "\"\"".to_string());
+        let base_url_js = serde_json::to_string(&base_url).unwrap_or_else(|_| "\"\"".to_string());
+        let js = format!(
+            "window.__OPENHUMAN_NOTCH_CORE_TOKEN__={token_js};\
+             window.__OPENHUMAN_NOTCH_CORE_URL__={base_url_js};\
+             window.dispatchEvent(new CustomEvent('notch:core-url',{{detail:{{url:{base_url_js}}}}}));"
+        );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/src/notch_window.rs` around lines 362 - 366, The string
interpolation in the format! that builds js in notch_window.rs (the js variable
where token and base_url are inserted) is unsafe: escape token and base_url for
inclusion in a single-quoted JS string (or serialize them as JS string literals)
before formatting to prevent quotes, backslashes or newlines from breaking or
injecting code; add a small helper (e.g., js_string_escape or use
serde_json::to_string to produce a quoted JS string and strip surrounding
quotes) and use that helper to produce safe values for token and base_url when
constructing the js variable and dispatching the CustomEvent.
app/src/App.tsx (1)

205-214: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set the one-shot guard only after a successful notch sync.

The current flow flips the guard before async work. A single transient failure permanently skips sync for this boot session, leaving notch visibility stale.

💡 Suggested fix
 const notchSyncedRef = useRef(false);
 useEffect(() => {
   if (isBootstrapping || notchSyncedRef.current) return;
-  notchSyncedRef.current = true;
-  void (async () => {
+  let cancelled = false;
+  const run = async (attempt = 0): Promise<void> => {
     try {
       const res = await openhumanGetVoiceServerSettings();
       await syncNotchVisibility(res.result.always_on_enabled);
+      notchSyncedRef.current = true;
     } catch (err) {
-      console.debug('[notch] boot visibility sync failed', err);
+      if (!cancelled && attempt < 2) {
+        window.setTimeout(() => void run(attempt + 1), 500 * (attempt + 1));
+        return;
+      }
+      console.debug('[notch] boot visibility sync failed', err);
     }
-  })();
+  };
+  void run();
+  return () => {
+    cancelled = true;
+  };
 }, [isBootstrapping]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if (isBootstrapping || notchSyncedRef.current) return;
    let cancelled = false;
    const run = async (attempt = 0): Promise<void> => {
      try {
        const res = await openhumanGetVoiceServerSettings();
        await syncNotchVisibility(res.result.always_on_enabled);
        notchSyncedRef.current = true;
      } catch (err) {
        if (!cancelled && attempt < 2) {
          window.setTimeout(() => void run(attempt + 1), 500 * (attempt + 1));
          return;
        }
        console.debug('[notch] boot visibility sync failed', err);
      }
    };
    void run();
    return () => {
      cancelled = true;
    };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/App.tsx` around lines 205 - 214, The notch sync one-shot guard is set
before the async work, causing failures to permanently skip retries; modify the
IIFE so notchSyncedRef.current is assigned true only after a successful sync:
call openhumanGetVoiceServerSettings() and await syncNotchVisibility(...) inside
the try block, and on success set notchSyncedRef.current = true (keep the early
return check for isBootstrapping || notchSyncedRef.current at the top) so
transient errors won’t flip the guard; reference functions/vars:
isBootstrapping, notchSyncedRef, openhumanGetVoiceServerSettings,
syncNotchVisibility.
app/src/components/settings/panels/VoicePanel.tsx (1)

509-525: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Separate persistence from notch-sync failure handling and guard re-entrant toggles.

This handler currently treats openhumanUpdateVoiceServerSettings and syncNotchVisibility as one atomic step. If persistence succeeds but notch sync fails, the UI is reverted to the wrong value. Also, multiple rapid clicks can race and leave inconsistent state.

💡 Suggested fix
+const [isUpdatingAlwaysOn, setIsUpdatingAlwaysOn] = useState(false);
...
 <button
   type="button"
   role="switch"
+  disabled={isUpdatingAlwaysOn}
   aria-checked={settings.always_on_enabled}
   aria-label={t('voice.debug.alwaysOn')}
   data-testid="voice-always-on-toggle"
   onClick={async () => {
+    if (isUpdatingAlwaysOn) return;
+    const prev = settings.always_on_enabled;
     const next = !settings.always_on_enabled;
+    setIsUpdatingAlwaysOn(true);
     setSettings(current =>
       current ? { ...current, always_on_enabled: next } : current
     );
     try {
       await openhumanUpdateVoiceServerSettings({ always_on_enabled: next });
-      // The notch pill is the always-on listening HUD: show it
-      // when listening is enabled, drop it when disabled.
-      await syncNotchVisibility(next);
     } catch (err) {
-      // Revert on failure so the UI reflects the persisted value.
       setSettings(current =>
-        current ? { ...current, always_on_enabled: !next } : current
+        current ? { ...current, always_on_enabled: prev } : current
       );
-      console.error('[VoicePanel] failed to toggle always-on', err);
+      setError(err instanceof Error ? err.message : t('voice.providers.failedToSave'));
+      setIsUpdatingAlwaysOn(false);
+      return;
     }
+    setIsUpdatingAlwaysOn(false);
+    void syncNotchVisibility(next).catch(err => {
+      if (process.env.NODE_ENV !== 'production') {
+        console.debug('[VoicePanel] notch sync failed', err);
+      }
+    });
   }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 509 - 525,
The toggle handler mixes persistence and notch-sync and is vulnerable to
re-entrant clicks; change it so clicking is ignored while a pendingToggle flag
(or similar) is true, update the UI optimistically and persist with
openhumanUpdateVoiceServerSettings, and only revert the UI if persistence
fails—do not revert when syncNotchVisibility fails; instead handle sync errors
separately (log/show feedback) and optionally retry without changing settings.
Use the existing symbols settings.always_on_enabled, setSettings,
openhumanUpdateVoiceServerSettings, and syncNotchVisibility and add a local
pending toggle guard to prevent concurrent toggles.
app/src/features/human/voice/useVoiceSpeak.ts (1)

50-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent out-of-order TTS playback when multiple voice:speak events overlap.

Older synth requests can resolve after newer ones and then stop newer playback, causing stale prompts to be spoken.

💡 Suggested fix
 export function useVoiceSpeak(): void {
   const handleRef = useRef<PlaybackHandle | null>(null);
+  const requestSeqRef = useRef(0);

   useEffect(() => {
     const onSpeak = (...args: unknown[]): void => {
       const payload = args[0];
       if (!isSpeakPayload(payload)) return;
       const text = payload.text.trim();
       if (!text) return;
+      const seq = ++requestSeqRef.current;

       void (async () => {
         try {
           const { audio_base64: audioBase64, audio_mime: audioMime } = await synthesizeSpeech(text);
-          if (!audioBase64) return;
+          if (!audioBase64 || seq !== requestSeqRef.current) return;
           handleRef.current?.stop();
           const handle = await playBase64Audio(audioBase64, audioMime || 'audio/mpeg', {
             maxDurationMs: MAX_SPEAK_MS,
           });
+          if (seq !== requestSeqRef.current) {
+            handle.stop();
+            return;
+          }
           handleRef.current = handle;
           handle.ended.catch(swallowAudioStop);
         } catch (err) {
           log('voice:speak playback failed: %o', err);
         }
       })();
     };

     socketService.on('voice:speak', onSpeak);
     return () => {
+      requestSeqRef.current += 1;
       socketService.off('voice:speak', onSpeak);
       handleRef.current?.stop();
       handleRef.current = null;
     };
   }, []);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

export function useVoiceSpeak(): void {
  const handleRef = useRef<PlaybackHandle | null>(null);
  const requestSeqRef = useRef(0);

  useEffect(() => {
    const onSpeak = (...args: unknown[]): void => {
      const payload = args[0];
      if (!isSpeakPayload(payload)) return;
      const text = payload.text.trim();
      if (!text) return;
      const seq = ++requestSeqRef.current;

      void (async () => {
        try {
          const { audio_base64: audioBase64, audio_mime: audioMime } = await synthesizeSpeech(text);
          if (!audioBase64 || seq !== requestSeqRef.current) return;
          // Stop any in-flight prompt before starting the next one.
          handleRef.current?.stop();
          const handle = await playBase64Audio(audioBase64, audioMime || 'audio/mpeg', {
            maxDurationMs: MAX_SPEAK_MS,
          });
          if (seq !== requestSeqRef.current) {
            handle.stop();
            return;
          }
          handleRef.current = handle;
          handle.ended.catch(swallowAudioStop);
        } catch (err) {
          log('voice:speak playback failed: %o', err);
        }
      })();
    };

    socketService.on('voice:speak', onSpeak);
    return () => {
      requestSeqRef.current += 1;
      socketService.off('voice:speak', onSpeak);
      handleRef.current?.stop();
      handleRef.current = null;
    };
  }, []);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/features/human/voice/useVoiceSpeak.ts` around lines 50 - 60, The
synth/ playback race happens because older synthesizeSpeech calls can resolve
later and call handleRef.current?.stop(), killing newer audio; fix by
introducing a monotonic request id (e.g., requestIdRef) and capture const
myRequestId = ++requestIdRef.current at the start of the async block, then after
synthesizeSpeech resolves verify myRequestId === requestIdRef.current before
stopping the current handle or assigning handleRef.current = handle (and
similarly before calling handle.ended.catch). This ensures only the latest
voice:speak invocation can stop or replace the shared handle; reference
synthesizeSpeech, handleRef, playBase64Audio, and handle.ended in your change.
app/src/notch/NotchApp.tsx (1)

164-261: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent post-unmount socket initialization and listener leaks.

connectSocket uses a local disposed flag and returns a disposer, but that disposer is never called by the effect. If unmount happens while connectCoreSocket() is in-flight, the socket can still be assigned/connected after teardown.

Suggested fix
 export default function NotchApp() {
   const { t } = useT();
   const [state, setState] = useState<NotchState>({ mode: 'ready', text: '' });
   const dismissRef = useRef<number | null>(null);
   const socketRef = useRef<Socket | null>(null);
+  const disposedRef = useRef(false);

   const connectSocket = useCallback(
     (baseUrl: string) => {
+      if (disposedRef.current) return;
       if (socketRef.current?.connected) return;
       if (socketRef.current) {
         socketRef.current.disconnect();
       }

-      let disposed = false;
       void (async () => {
         try {
           const socket = await connectCoreSocket({
             getBaseUrl: async () => baseUrl,
-            isDisposed: () => disposed,
+            isDisposed: () => disposedRef.current,
           });
-          if (!socket || disposed) return;
+          if (!socket || disposedRef.current) {
+            socket?.disconnect();
+            return;
+          }
           socketRef.current = socket;
           // ...
         } catch (err) {
           // ...
         }
       })();

-      return () => {
-        disposed = true;
-      };
     },
     [t, clearDismiss, scheduleDismiss]
   );

   useEffect(() => {
+    disposedRef.current = false;
     // ...
     return () => {
+      disposedRef.current = true;
       window.removeEventListener('notch:core-url', handler as EventListener);
       socketRef.current?.disconnect();
       socketRef.current = null;
       clearDismiss();
     };
   }, [connectSocket, clearDismiss]);

Also applies to: 267-289

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/notch/NotchApp.tsx` around lines 164 - 261, connectSocket can
initialize a socket after the component unmounts because the returned disposable
isn't invoked; ensure teardown is actually used and prevent post-unmount
assignment/leaks by (1) keeping the disposed flag check after await
connectCoreSocket(...) and if disposed is true then immediately disconnect the
newly created socket and return, (2) register socket listeners on the socket
instance only when not disposed, and (3) ensure the caller effect that invokes
connectSocket actually captures and calls its returned cleanup function (i.e.,
call the function returned by connectSocket inside the effect's cleanup). Refer
to connectSocket, connectCoreSocket, socketRef, and the local disposed flag when
making these changes.
app/src/utils/tauriCommands/window.ts (1)

122-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace raw console.warn error logging with redacted namespaced debug logs.

Line 122 and Line 136 currently log raw err objects in app/src/**. That can leak sensitive runtime details and bypasses the repo’s logging convention. Use a namespaced debug logger and only log a sanitized message.

Proposed fix
 import { invoke } from '`@tauri-apps/api/core`';
 import { getCurrentWindow } from '`@tauri-apps/api/window`';
+import createDebug from 'debug';

 import { isTauri } from './common';
+const log = createDebug('openhuman:tauri:window');
@@
   try {
     await invoke('notch_window_show');
   } catch (err) {
-    console.warn('[notch] show failed', err);
+    const message = err instanceof Error ? err.message : String(err);
+    log('notch_window_show failed: %s', message);
   }
 }
@@
   try {
     await invoke('notch_window_hide');
   } catch (err) {
-    console.warn('[notch] hide failed', err);
+    const message = err instanceof Error ? err.message : String(err);
+    log('notch_window_hide failed: %s', message);
   }
 }

As per coding guidelines: "Never log secrets or full PII—redact sensitive information in all logs" and "In app/src, use namespaced debug logs (e.g. from the debug npm package) with dev-only detail for development diagnostics".

Also applies to: 136-137

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/utils/tauriCommands/window.ts` around lines 122 - 123, Replace the
raw console.warn(err) calls in app/src/utils/tauriCommands/window.ts with a
namespaced debug logger and redacted error text: import and instantiate a debug
logger (e.g. const debug = createDebug('app:tauri:window') or using the repo's
existing debug factory), then change the console.warn in the function(s) that
show the window (and the other occurrences around lines 136-137) to debug(...)
while only logging a sanitized message and non-sensitive fields (e.g.
debug('show failed:', safeErrorString(err)) where safeErrorString returns
err.message or a redacted summary, not the full err object); ensure you add the
debug import/initialization and replace all similar console.warn/raw err logs in
this module to follow the same pattern.
src/openhuman/accessibility/app_fastpaths/music.rs (2)

150-166: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/openhuman/accessibility/app_fastpaths/music.rs"

# Basic sanity: file exists
test -f "$FILE" && echo "Found $FILE" || { echo "Missing $FILE"; exit 1; }

# Show the relevant sections around the reported line ranges/snippet
echo "---- replace_ci context (lines 120-220) ----"
sed -n '120,220p' "$FILE" | cat -n

echo "---- logging context (lines 220-320) ----"
sed -n '220,320p' "$FILE" | cat -n

# Locate replace_ci definition and all call sites
echo "---- replace_ci definitions ----"
rg -n "fn\s+replace_ci\s*\(" "$FILE"

echo "---- replace_ci call sites ----"
rg -n "replace_ci\s*\(" "$FILE"

# Find likely sensitive logging of query/user input
echo "---- likely query/user-input logging ----"
rg -n "log::|tracing::|info!|debug!|trace!|warn!|error!" "$FILE"

# Specifically search for "query" in logs
echo "---- occurrences of parsed/query in file ----"
rg -n "(query|parsed|url|request|raw)" "$FILE"

Repository: tinyhumansai/openhuman

Length of output: 11772


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "replace_ci\s*\(" -S .

Repository: tinyhumansai/openhuman

Length of output: 298


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/openhuman/accessibility/app_fastpaths/music.rs"
sed -n '1,130p' "$FILE" | cat -n
sed -n '130,210p' "$FILE" | cat -n

Repository: tinyhumansai/openhuman

Length of output: 9202


Fix Unicode-unsafe indexing in replace_ci and redact logged play query.

  • replace_ci lowercases haystack into hl but slices hl using byte offsets (i) from the original haystack; Unicode case mapping can change byte lengths/boundaries, so hl[i..] can panic on valid UTF-8 input.
  • run logs the full user-derived query at info level (log::info!("[automate::music] ▶ play query={query:?}")), violating the “never log secrets/full PII” redaction guidance.
🛠️ Proposed fix
 fn replace_ci(haystack: &str, needle: &str, repl: &str) -> String {
-    let hl = haystack.to_lowercase();
-    let nl = needle.to_lowercase();
     let mut out = String::with_capacity(haystack.len());
     let mut i = 0;
     while i < haystack.len() {
-        if hl[i..].starts_with(&nl) {
+        let is_match = haystack[i..]
+            .get(..needle.len())
+            .map(|s| s.eq_ignore_ascii_case(needle))
+            .unwrap_or(false);
+        if is_match {
             out.push_str(repl);
             i += needle.len();
         } else {
-            let ch = haystack[i..].chars().next().unwrap();
+            let ch = haystack[i..].chars().next().expect("char boundary");
             out.push(ch);
             i += ch.len_utf8();
         }
     }
     out
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/music.rs` around lines 150 - 166,
The replace_ci function is using byte-indexing into the lowercased string
(hl[i..]) while i is tracked over the original haystack's bytes, which is
Unicode-unsafe; change replace_ci to iterate using char_indices (or parallel
char_indices over haystack and hl) and track byte offsets for both original and
lowercased forms so you only slice at valid UTF-8 boundaries (e.g., use for
(orig_byte_idx, ch) in haystack.char_indices() and maintain a corresponding
hl_byte_idx into hl to call hl[hl_byte_idx..].starts_with(&nl) and advance both
indices by the matched needle.chars().map(|c|c.len_utf8()).sum::<usize>() or by
the matched byte lengths); and in run, stop logging the full user-derived query
via log::info!("[automate::music] ▶ play query={query:?}"), instead log a
redacted or masked version (e.g., omit the query, log only length or a fixed
placeholder) to avoid exposing PII/secrets.

237-237: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact query content from logs.

This logs raw user-derived query text. Please log metadata (length/hash) instead of full content.

🛡️ Proposed fix
-    log::info!("[automate::music] ▶ play query={query:?}");
+    log::debug!(
+        "[automate::music] ▶ play query_chars={}",
+        query.chars().count()
+    );

As per coding guidelines: Never log secrets or full PII—redact sensitive information in all logs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    log::debug!(
        "[automate::music] ▶ play query_chars={}",
        query.chars().count()
    );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/music.rs` at line 237, The current
log call logs raw user query via log::info!("[automate::music] ▶ play
query={query:?}"); — replace that to avoid PII by computing and logging only
non-sensitive metadata: compute query_len = query.len() and query_hash =
sha256_hex(query) (or another stable hash) and update the log to something like
log::info!("[automate::music] ▶ play query_len={} query_hash={}", query_len,
query_hash); ensure the original `query` string is not included anywhere in logs
and reference the same variable name `query` and the logging site (log::info!
call) when making the change.
src/openhuman/accessibility/automate.rs (1)

210-212: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact user/model content before logging automation state.

Line 224 logs the full goal, Line 314 logs label/filter, and Line 300 logs parse errors that include raw model output from Line 211. Those fields can contain sensitive user content and should be redacted/omitted in logs.

🔧 Suggested patch
+fn redact_log_value(input: &str) -> String {
+    let len = input.chars().count();
+    if len == 0 {
+        "<empty>".to_string()
+    } else {
+        format!("<redacted:{len} chars>")
+    }
+}
+
 fn parse_action(raw: &str) -> Result<Action, String> {
@@
-    Err(format!(
-        "could not parse an action from model output: {trimmed:?}"
-    ))
+    Err("could not parse an action from model output".to_string())
 }
@@
     log::info!(
-        "{LOG_PREFIX} ▶ run app={app:?} goal={goal:?} budget={}",
-        opts.step_budget
+        "{LOG_PREFIX} ▶ run app={app:?} goal={} budget={}",
+        redact_log_value(goal),
+        opts.step_budget
     );
@@
-                    log::warn!("{LOG_PREFIX} step={step} unparseable action, retrying: {e}");
+                    log::warn!("{LOG_PREFIX} step={step} unparseable action, retrying");
@@
         log::info!(
-            "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}",
+            "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={} filter={}",
             action.action,
-            action.label,
-            action.filter
+            redact_log_value(&action.label),
+            redact_log_value(&action.filter)
         );
As per coding guidelines `**/*.{ts,tsx,js,jsx,rs}`: Never log secrets or full PII—redact sensitive information in all logs.

Also applies to: 223-226, 300-301, 313-318

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/automate.rs` around lines 210 - 212, The logs
currently emit raw user/model content (e.g., variables goal, label, filter and
trimmed/model output) which may contain PII; update the logging and error
creation to redact or omit sensitive content instead of printing the full values
— replace direct interpolations like trimmed, goal, label, filter with a
sanitized/redacted representation (e.g., call a helper redact_sensitive(s: &str)
or replace with "[REDACTED]") in the Err(format!(...)) and all log calls that
reference those symbols so logs never contain full user/model text.
src/openhuman/credentials/ops.rs (1)

44-47: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add symmetric teardown for always-on service.

This starts a new login-gated service, but stop_login_gated_services does not stop/disable it. That leaves always-on capture active after logout/user switch.

Suggested direction
 pub async fn stop_login_gated_services(config: &Config) {
+    // Stop/disable always-on listening before tearing down auth-scoped services.
+    crate::openhuman::voice::always_on::stop();
+
     // 1. Autocomplete — stop engine + Swift overlay helper.
     {
// src/openhuman/voice/always_on.rs
pub fn stop() {
    ENABLED.store(false, Ordering::SeqCst);
    // Prefer full stream/thread teardown if possible, not just logical disable.
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/credentials/ops.rs` around lines 44 - 47, The always-on voice
service started by crate::openhuman::voice::always_on::start_if_enabled(config)
isn't torn down on logout; add a symmetric shutdown path by implementing a
public stop() in src/openhuman/voice/always_on.rs that clears ENABLED
(Ordering::SeqCst) and performs full stream/thread teardown, then call
crate::openhuman::voice::always_on::stop() from stop_login_gated_services so the
always-on capture is fully disabled on logout/user switch.
src/openhuman/tools/impl/browser/screenshot.rs (1)

209-225: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't return an oversized JPEG from the "fit within budget" helper.

If none of the generated thumbnails fit max_bytes, last still holds the final oversized candidate, and the current tail returns it as Ok(...). That violates this helper's contract and can inline a payload larger than the budget on the failure path.

🔧 Suggested fix
 fn downscale_to_jpeg(bytes: &[u8], max_bytes: usize) -> Result<(Vec<u8>, u32, u32), String> {
     let img = image::load_from_memory(bytes).map_err(|e| format!("decode: {e}"))?;
     let mut last: Option<(Vec<u8>, u32, u32)> = None;
     for max_dim in [1568u32, 1280, 1024, 768, 600] {
         let thumb = img.thumbnail(max_dim, max_dim); // fits within max_dim², keeps aspect
         let mut buf = std::io::Cursor::new(Vec::new());
         image::codecs::jpeg::JpegEncoder::new_with_quality(&mut buf, 72)
             .encode_image(&thumb)
             .map_err(|e| format!("jpeg encode: {e}"))?;
         let out = buf.into_inner();
         let (w, h) = (thumb.width(), thumb.height());
         if out.len() <= max_bytes {
             return Ok((out, w, h));
         }
         last = Some((out, w, h));
     }
-    last.ok_or_else(|| "could not produce a fitting JPEG".to_string())
+    Err(last
+        .map(|(out, w, h)| {
+            format!("smallest JPEG was still too large ({} bytes at {}x{})", out.len(), w, h)
+        })
+        .unwrap_or_else(|| "could not produce a fitting JPEG".to_string()))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

fn downscale_to_jpeg(bytes: &[u8], max_bytes: usize) -> Result<(Vec<u8>, u32, u32), String> {
    let img = image::load_from_memory(bytes).map_err(|e| format!("decode: {e}"))?;
    let mut last: Option<(Vec<u8>, u32, u32)> = None;
    for max_dim in [1568u32, 1280, 1024, 768, 600] {
        let thumb = img.thumbnail(max_dim, max_dim); // fits within max_dim², keeps aspect
        let mut buf = std::io::Cursor::new(Vec::new());
        image::codecs::jpeg::JpegEncoder::new_with_quality(&mut buf, 72)
            .encode_image(&thumb)
            .map_err(|e| format!("jpeg encode: {e}"))?;
        let out = buf.into_inner();
        let (w, h) = (thumb.width(), thumb.height());
        if out.len() <= max_bytes {
            return Ok((out, w, h));
        }
        last = Some((out, w, h));
    }
    Err(last
        .map(|(out, w, h)| {
            format!("smallest JPEG was still too large ({} bytes at {}x{})", out.len(), w, h)
        })
        .unwrap_or_else(|| "could not produce a fitting JPEG".to_string()))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/browser/screenshot.rs` around lines 209 - 225, The
helper downscale_to_jpeg currently returns the last oversized candidate stored
in last when no thumbnail fits max_bytes; change this so that if the loop
finishes without finding out.len() <= max_bytes it returns an Err (e.g., via
last.ok_or_else -> Err) instead of returning Ok(last), preserving the function's
contract; update the tail of downscale_to_jpeg (referencing variables last and
max_bytes and the function name) to produce an error when no size fits rather
than returning the oversized JPEG.
src/openhuman/tools/impl/computer/automate.rs (2)

1-16: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Put AutomateTool in the owning domain module, not src/openhuman/tools/impl/.

This adds a new domain-owned tool to the legacy impl tree instead of the repo’s domain-owned tools.rs / tools/ structure. Please move it into the owning module and re-export it from the normal tool surface there.

Based on learnings, "Tool implementations belonging to a domain must live in that owning module's tools.rs (and optional tools/ submodules), not in src/openhuman/tools/impl/; re-export those tool types from src/openhuman/tools/mod.rs."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/computer/automate.rs` around lines 1 - 16, The
AutomateTool implementation belongs in the domain's owning module not in the
impl tree: move the AutomateTool type, its Tool impl, and any helper
functions/uses (e.g., AutomateOptions, RealBackend, is_sensitive_app) out of the
legacy impl file and into the owning domain's tools.rs (or a tools/ submodule)
for the computer/openhuman domain, update the module imports there, and then
re-export AutomateTool from the central tools mod (src/openhuman/tools/mod.rs)
so external callers use the normal tool surface; also remove the old file or its
module entry from the impl/ folder and fix any use sites to import the
re-exported AutomateTool.

105-105: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't log the raw goal.

goal is free-form user input, so this can capture message bodies, names, or other sensitive content in persistent logs. Log only redacted/derived metadata here (for example app, goal_len, or a hash).

Proposed fix
-        log::info!("[automate] ▶ tool execute app={app:?} goal={goal:?}");
+        log::info!(
+            "[automate] ▶ tool execute app={app:?} goal_len={}",
+            goal.chars().count()
+        );

As per coding guidelines, **/*.{ts,tsx,js,jsx,rs}: Never log secrets or full PII—redact sensitive information in all logs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        log::info!(
            "[automate] ▶ tool execute app={app:?} goal_len={}",
            goal.chars().count()
        );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/computer/automate.rs` at line 105, The current log
statement uses log::info! and includes the raw free-form variable goal which may
contain sensitive user data; change the log to avoid printing goal directly and
instead record only derived/redacted metadata (for example keep app as-is and
replace goal with goal_len and/or a short hash), e.g. compute a length
(goal.len()) and a stable hash of goal (SHA-256 or similar) and log those fields
instead of the full goal string so log::info!("[automate] ▶ tool execute
app={app:?} goal_len={goal_len} goal_hash={goal_hash:?}");
src/openhuman/voice/always_on.rs (2)

229-230: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid stale config snapshot in long-lived always-on processor.

The processor captures Config once at startup and never refreshes it; later config updates (wake word, STT provider, VAD tuning) won’t take effect while RUNNING is true.

Suggested direction
-    let config = app_config.clone();
+    // Keep live runtime settings in shared state, updated on every start_if_enabled call.
+    update_runtime_voice_settings(&app_config.voice_server);
// On each utterance / decision point:
let settings = get_runtime_voice_settings();

Use a shared RwLock/atomic settings struct so reconfig applies without restarting capture.

Also applies to: 252-253, 286-289, 331-334, 363-375

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/always_on.rs` around lines 229 - 230, The always-on
processor currently captures a static AppConfig via app_config.clone and builds
VadConfig with VadConfig::from_server_config at startup, causing updates (wake
word, STT provider, VAD tuning) to be ignored while RUNNING is true; change this
to read a shared, mutable runtime settings object instead of cloning at startup:
introduce or use an existing shared RwLock/atomic voice settings struct (e.g.,
get_runtime_voice_settings()) and replace direct uses of app_config /
VadConfig::from_server_config in the long‑lived loop and at each decision point
(locations around where app_config.clone is used and the VadConfig creation) so
you fetch updated settings per utterance/decision and rebuild or reference the
latest VadConfig/STT/wake-word provider on demand.

365-375: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact transcript/command content from logs.

This logs full spoken content (command={cmd:?} / transcript={text:?}), which can include PII/secrets from ambient speech.

Proposed redaction change
- log::info!("{LOG_PREFIX} wake word matched → command={cmd:?}");
+ log::info!("{LOG_PREFIX} wake word matched → command_len={}", cmd.len());

- log::info!(
-     "{LOG_PREFIX} no wake word ({:?}) in transcript={text:?}; ignored",
-     config.voice_server.wake_word
- );
+ log::info!(
+     "{LOG_PREFIX} no wake word match (wake_word_configured={}, transcript_len={}); ignored",
+     !config.voice_server.wake_word.trim().is_empty(),
+     text.len()
+ );
As per coding guidelines: `Never log secrets or full PII—redact sensitive information in all logs`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                    log::info!("{LOG_PREFIX} wake word matched → command_len={}", cmd.len());
                    notch_status("Processing", 12000); // pill: running the command
                    deliver_command(config, cmd).await;
                }
                None => {
                    // Visible at info so the user can see WHAT was heard when the
                    // wake word didn't match (diagnoses "Hey Tiny not responding").
                    log::info!(
                        "{LOG_PREFIX} no wake word match (wake_word_configured={}, transcript_len={}); ignored",
                        !config.voice_server.wake_word.trim().is_empty(),
                        text.len()
                    );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/always_on.rs` around lines 365 - 375, The logs currently
print full spoken content via log::info! including cmd and text (see the log
lines around LOG_PREFIX and the deliver_command call), which may expose
PII/secrets; update those log statements to redact or mask sensitive content
instead of printing cmd or text (e.g., replace with a redacted placeholder,
truncated summary, or a hashed token) and only log safe metadata (like whether a
wake word matched, confidence flags, or a masked length). Apply the change to
both the wake-word-matched branch (before calling deliver_command) and the None
branch that logs transcript, or factor out a helper (e.g., redact_transcript or
mask_sensitive) to centralize redaction so future logs use the safe
representation. Ensure deliver_command invocation remains unchanged except for
not logging cmd contents directly in the surrounding info messages.

M3gA-Mind added 2 commits June 5, 2026 02:48
Union the voice mod decls (always_on + approval_surface); take main's
fully-updated tracker table and mark the Phase 4 voice-confirmation row Done.
…tion)

The speak bus is a process-global broadcast and all approval prompts share
VOICE_APPROVAL_SOURCE, so the happy-path test's published prompt leaked into
the parallel 'should stay silent' receivers. Guard the bus-touching tests
with a shared lock (acquire → subscribe) so each runs isolated.
@M3gA-Mind M3gA-Mind marked this pull request as ready for review June 4, 2026 21:31
@M3gA-Mind
Copy link
Copy Markdown
Contributor Author

Independent review (beyond the CodeRabbit pass)

Reviewed Phase 4 — the voice-native approval gate: the voice origin flag plumbing, VoiceApprovalSurfaceSubscriber, the speak_bus + socket bridge, and useVoiceSpeak.

Reviewed clean

  • No parallel system — reuses the existing gate classification: ApprovalChatContext.voice (set from the optional voice RPC/socket param) → chat_is_voiceDomainEvent::ApprovalRequested { is_voice }. Typed turns omit it and stay visual-only. Additive field + optional param ⇒ older clients keep the prior behaviour.
  • SurfaceVoiceApprovalSurfaceSubscriber mirrors telegram/approval_surface: speaks only on ApprovalRequested { is_voice: true } with a non-empty summary; silent for typed / non-approval / empty-summary. spoken_prompt strips a trailing period before appending "Say yes to confirm, or no to cancel."
  • Speak primitivespeak_bus (publish/subscribe broadcast) mirrors overlay/bus; core/socketio.rs bridges it to a voice:speak event; useVoiceSpeak (mounted app-wide) plays it through the existing TTS pipeline. Fire-and-forget (drops cleanly when no subscriber).
  • Spoken answer — rides the existing transcription → auto-send → web.rs yes/no ingress → approval_decide; no new code, no new approval semantics.
  • Follow-ups inherit no voice flag (never re-prompt) — correct.

One fix applied during babysit: the approval_surface unit tests were flaky under cargo's default parallelism — the speak bus is a process-global broadcast and every prompt shares VOICE_APPROVAL_SOURCE, so the happy-path test's published prompt leaked into the parallel "should stay silent" receivers. Serialized the bus-touching tests with a shared lock (acquire → subscribe); 7/7 green in parallel now.

The cross-process glue (web.rs param, socket bridge, startup registration) is integration-shaped (exercised via the approval E2E); the pure formatter/gate/bus logic is fully unit-tested.

No correctness issues. LGTM once CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant