Skip to content

feat(voice): global push-to-talk hotkey (#3090)#3349

Open
CodeGhost21 wants to merge 31 commits into
tinyhumansai:mainfrom
CodeGhost21:feat/global-ptt-3090
Open

feat(voice): global push-to-talk hotkey (#3090)#3349
CodeGhost21 wants to merge 31 commits into
tinyhumansai:mainfrom
CodeGhost21:feat/global-ptt-3090

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented Jun 4, 2026

Summary

  • Hold-to-talk global hotkey: mic opens on press, closes on release, transcript sent to the active thread, agent reply optionally spoken via TTS — no focus stealing.
  • Cross-platform via tauri-plugin-global-shortcut uniform code path (deliberately different from dictation's OS-forked rdev/Tauri-plugin path).
  • Borderless always-on-top overlay window (lazy-created on first register, destroyed on unregister) at /ptt-overlay.
  • Audible open/close/error CC0 chimes; 10s renderer-side watchdog finalises sessions if the OS swallows the release event.
  • speak_reply / source / session_id additive optional fields on channel.web_chat; backwards-compatible — older clients are unaffected.
  • New ptt.* and pttOverlay.* i18n keys with real translations across all 13 locales; new voice.ptt entry in the about-app capability catalog.

Problem

OpenHuman had voice input on iOS (mobile.push_to_talk) and dictation on macOS but nothing global on the desktop — users had to focus the chat window before talking. Issue #3090 asks for a Clicky-style hold-to-talk shortcut that works from any window without stealing focus.

The dictation path is OS-forked (rdev on macOS, tauri plugin on Windows/Linux) and inserts text into the focused field. Reusing it for PTT would have forced a focus switch and inherited two code paths to maintain. PTT instead routes through a single tauri-plugin-global-shortcut registration and dispatches the transcript directly into the active chat thread via channel.web_chat, so behaviour is uniform across macOS, Windows, and Linux/X11.

Solution

  • Shortcut layer (app/src-tauri/src/ptt_hotkeys.rs) — parses the configured shortcut string (Ctrl+Shift+Space, Cmd+Alt+M, …), validates modifier-only and empty tokens, fails registration when the binding collides with the dictation hotkey, and CAS-guards press/release so a stuck OS modifier can't drive the state machine into both "held" and "released" at once.
  • Overlay window (app/src-tauri/src/ptt_overlay.rs) — lazy borderless always-on-top window; created on first register, destroyed on unregister so we don't keep a hidden window alive when PTT is off. macOS-gated accept_first_mouse and tag-overlay-error debug logging tighten reviewer findings.
  • Voice domain (src/openhuman/voice/bus.rs) — publishes DomainEvent::Voice::PttTranscriptCommitted on successful commit; re-exported as voice::publish so the channel layer doesn't reach into the event bus directly.
  • Channel layer (src/openhuman/channels/providers/web.rs)ChatRequestMetadata carries the new optional fields; speak_reply=true triggers reply_speech::invoke and publishes the PTT-committed event for downstream subscribers (analytics, mascot).
  • Renderer service (app/src/services/pttService.ts) — state machine (idlearmingheldcommitting), 10s watchdog timer for swallowed release events, preempt-race CAS guard so a fresh start can't orphan an in-flight stop, and a Tauri-only test seam.
  • Redux + settings UIpttSlice stores the binding + enabled toggle; PttSettingsPanel lets the user pick a shortcut, surfaces conflict-with-dictation errors with localized messages, and explains overlay/exclusive-fullscreen behaviour inline.
  • Capability catalog — new voice.ptt entry in src/openhuman/about_app/catalog_data.rs (Conversation category, DERIVED_TO_BACKEND privacy because audio routes through the configured STT provider, same shape as conversation.send_voice).

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy — Rust unit + JSON-RPC E2E tests for schema, voice/bus, web channel; Vitest tests for pttSlice, pttService, chatService.speak_reply forwarding, PttOverlayPage, PttSettingsPanel; Tauri shell tests for ptt_hotkeys parser/conflict/state; new capability_list_includes_voice_ptt test pinning the catalog entry.
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/pr-ci.yml. New code lives in dedicated PTT modules with co-located tests; pnpm test:coverage was not re-run locally on this branch tip (cold full-suite cost) — the CI gate will compute and enforce the actual percentage, and visual review of the new modules shows the touched lines are reached by the listed Vitest + cargo tests.
  • Coverage matrix updated — N/A: matrix tracks higher-level feature rows and the new PTT feature sits inside the existing Voice / Conversation rows that already exist.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related — see the Closes line below.
  • No new external network dependencies introduced (mock backend used per Testing Strategy) — PTT reuses the existing STT and chat paths; no new outbound hosts.
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: PTT is opt-in via Settings → Voice and does not affect the existing release-cut smoke flows; reviewers are asked to bind a hotkey and confirm overlay + transcript dispatch.
  • Linked issue closed via Closes #NNN in the ## Related section — see Closes #3090 below.

Impact

  • Desktop only. macOS, Windows, and Linux/X11 via tauri-plugin-global-shortcut. The iOS client and headless deployments are unaffected (the hotkey manager is not mounted on those targets).
  • Backwards-compatible RPC. channel.web_chat accepts three new optional metadata fields; clients that don't send them get the existing behaviour. Tests pin the schema so a future tightening can't silently break older callers.
  • Security / privacy. PTT audio routes through whichever STT provider the user has configured — same path as conversation.send_voice. Capability catalog entry uses DERIVED_TO_BACKEND privacy so the Privacy surface reflects this. Microphone permission is requested on first use.
  • No new external network dependencies. No new outbound hosts, no new third-party SDKs.
  • Out of scope: the background screen-capture half of [FEATURE REQUEST] Global push-to-talk keybind + screen share while tabbed out / in background #3090 is tracked as a separate follow-up PR; this PR ships only the PTT half.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Commit & Branch

  • Branch: feat/global-ptt-3090
  • Commit SHA: 8e5ce8d (tip after feat(about_app) + style(ptt) final-sweep commits)

Validation Run

  • pnpm --filter openhuman-app format:check — clean (cargo fmt + prettier both pass after the final-sweep commit)
  • pnpm typecheck — clean (no TS errors)
  • Focused tests: pnpm debug rust capability_list_includes_voice_ptt ✓ ; Vitest unit suite — all 4688 tests green on the final re-run (one earlier run flaked on a pre-existing Conversations.render test unrelated to PTT, did not reproduce); pnpm i18n:check and pnpm i18n:english:check both no-worse-than-main (main: 1312 unexpected English; this branch: 1285 — branch improves the i18n posture).
  • Rust fmt/check (if changed): cargo fmt --check clean, cargo check --manifest-path Cargo.toml clean.
  • Tauri fmt/check (if changed): cargo fmt --manifest-path app/src-tauri/Cargo.toml --check clean, pnpm rust:check clean.

Validation Blocked

  • command: pnpm test:coverage (full coverage matrix)
  • error: not run on the final commit — cold-build cost exceeded the local time budget for this PR; the CI pr-ci.yml job will compute and enforce the actual diff-coverage gate.
  • impact: low — the new files are dedicated PTT modules with co-located unit tests (pttSlice, pttService, chatService.speak_reply, PttOverlayPage, PttSettingsPanel, ptt_hotkeys, voice/bus, channel-web schema, capability catalog), and the CI gate will block merge if diff-coverage falls below 80%.

Behavior Changes

  • Intended behavior change: a configured global hotkey now opens a borderless always-on-top overlay, records audio while held, and dispatches the transcribed text to the active chat thread on release. Optional speak_reply plays the agent's TTS reply.
  • User-visible effect: a new Settings → Voice → Push-to-Talk panel; a hold-to-talk shortcut that works from any window without stealing focus.

Parity Contract

  • Legacy behavior preserved: the existing iOS mobile.push_to_talk and the macOS/Windows dictation paths are untouched. The new desktop PTT path is uniform across platforms and does not collide with dictation (registration fails fast with a localized error when the bindings overlap).
  • Guard/fallback/dispatch parity checks: ChatRequestMetadata defaults preserve the pre-PTT request shape; older callers that do not send speak_reply / source / session_id see the exact same channel behaviour as before.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • New Features

    • Global push-to-talk (PTT) hotkey with recording, chimes, overlay and automatic transcription.
    • PTT settings panel (hotkey, speak-replies, show-overlay) with persisted preferences.
    • Overlay page for visual “listening/idle” feedback; optional synthesized voice replies for PTT.
  • Bug Fixes / UX

    • Hotkey registration reports conflicts (e.g., dictation/unsupported platforms) with localized errors.
  • Localization

    • Added PTT translations across 20+ locales.
  • Documentation

    • Added comprehensive PTT design/spec.

…3090)

Design doc for the PTT half of issue tinyhumansai#3090 — a hold-to-talk global hotkey
that lets the user speak to OpenHuman while it's in the background, with
agent replies routed through TTS. Background screen capture (the other
half of tinyhumansai#3090) is scoped to a follow-up PR.
…umansai#3090)

15-task plan covering the Rust schema delta (`channel.web_chat`),
voice/bus event, Tauri-shell hotkey + overlay window, frontend redux
slice + service state machine + UI + settings panel, 13-locale i18n,
and a WDIO E2E spec with mocked STT.
…inyhumansai#3090)

- Drop MaybePttRoot optional-key type; selectors now use { ptt: PttState }
  mirroring the mascotSlice selector convention
- Remove duplicate `export default pttSlice.reducer`; keep only the named
  pttReducer export that index.ts already imports
- Add resetUserScopedState test asserting dirty state returns to initialPttState
…es (tinyhumansai#3090)

Store the most recent Tauri hotkey registration error in pttSlice
(registrationError, transient/non-persisted) and dispatch it from
usePttHotkey on failure, clearing it on success. PttSettingsPanel
maps well-known error strings (dictation conflict, Wayland, accessibility,
shortcut-in-use) to their existing i18n keys and renders them inline below
the capture input so the user sees the real failure reason instead of a
silent "saved" state.
Surfaces the global push-to-talk feature in the user-facing capability
catalog so the /about and settings search surfaces describe it. The id,
domain, and Conversation category sit next to conversation.send_voice
and the iOS mobile.push_to_talk entry — pinned by a new
`capability_list_includes_voice_ptt` test that also asserts the how_to
mentions Push-to-Talk and the description mentions hold + hotkey, so a
future copy refactor can't silently drop the hook.

Privacy is `DERIVED_TO_BACKEND` because PTT routes audio through the
configured STT provider (matching `conversation.send_voice`'s shape).
…mansai#3090)

Final quality-sweep pass for the PTT branch:

- prettier --write across the new PTT TypeScript / overlay / settings files
- cargo fmt --all across the new Tauri shell PTT modules, the core voice/web channel touch points, and the tests files that grew assertions for the optional speak_reply/source/session_id metadata
- fix the only ESLint error on the branch: empty `catch (_) {}` in
  pttService onStart's preempt-race cancel — annotated with a comment
  explaining the orphan-session cleanup is best-effort

Behaviour unchanged; this commit is purely whitespace + the catch comment.
@CodeGhost21 CodeGhost21 requested a review from a team June 4, 2026 09:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffdb91d4-2d77-48a5-be43-2a4b57fcf316

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5ce8d and b4796d0.

📒 Files selected for processing (25)
  • app/src-tauri/src/lib.rs
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/services/__tests__/chatService.test.ts
  • app/src/services/chatService.ts
  • app/src/store/index.ts
  • src/core/event_bus/events.rs
  • src/core/event_bus/mod.rs
  • src/core/socketio.rs
  • src/openhuman/about_app/catalog_data.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs
💤 Files with no reviewable changes (7)
  • src/core/socketio.rs
  • src/openhuman/about_app/catalog_data.rs
  • src/core/event_bus/mod.rs
  • src/openhuman/channels/providers/web_tests.rs
  • src/core/event_bus/events.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/providers/web.rs
✅ Files skipped from review due to trivial changes (11)
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/it.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • app/src/store/index.ts
  • app/src/services/chatService.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src-tauri/src/lib.rs

📝 Walkthrough

Walkthrough

Adds a complete global Push‑to‑Talk feature: Tauri hotkey expansion/registration and overlay, renderer PTT service and audio adapters, persisted PTT Redux slice and settings UI, chat metadata forwarding, core voice events and publishers, tests (unit, Vitest, E2E), i18n, and design/docs.

Changes

Global Push-to-Talk

Layer / File(s) Summary
Tauri hotkey & overlay
app/src-tauri/src/ptt_hotkeys.rs, app/src-tauri/src/ptt_overlay.rs, app/src-tauri/src/lib.rs
Adds PTT hotkey state, variant expansion/validation, dictation conflict checks, overlay window lifecycle, and Tauri commands register_ptt_hotkey/unregister_ptt_hotkey/show_ptt_overlay.
Renderer PTT state & settings
app/src/store/pttSlice.ts, app/src/store/index.ts, app/src/pages/settings/voice/PttSettingsPanel.tsx, app/src/hooks/usePttHotkey.ts, app/src/test/test-utils.tsx, app/src/lib/i18n/*.ts
New persisted ptt slice, settings panel with keyboard capture/localized errors, hook to register/unregister hotkeys, test wiring, and i18n entries (14 locales).
PTT runtime, audio & adapters
app/src/services/pttService.ts, app/src/features/voice/pttAudio.ts, app/src/features/voice/pttTranscribe.ts, app/src/features/voice/pttChimes.ts, app/src/features/voice/pttThread.ts, tests app/src/services/__tests__/*
Adds pttService state machine (start/stop/cancel/preempt/watchdog), MediaRecorder adapter, transcription RPC adapter, chime playback, thread resolution, and comprehensive unit tests.
App wiring & overlay page
app/src/components/PttHotkeyManager.tsx, app/src/hooks/usePttHotkey.ts, app/src/App.tsx, app/src/AppRoutes.tsx, app/src/pages/PttOverlayPage.tsx, app/src/utils/tauriCommands/ptt.ts
Mounts PttHotkeyManager, adds overlay route/page, listens to ptt://start/ptt://stop events, and provides renderer wrappers for Tauri commands.
Chat metadata & core voice plumbing
app/src/services/chatService.ts, src/openhuman/channels/providers/web.rs, src/core/event_bus/events.rs, src/openhuman/voice/bus.rs, src/openhuman/voice/reply_speech.rs
Extends chat send params and web channel metadata (speak_reply, source, session_id), synthesizes reply when requested, publishes PTT transcript committed events, and adds a reply‑speech test seam.
E2E, JSON-RPC tests, docs, assets, capability
app/test/e2e/specs/ptt-flow.spec.ts, tests/json_rpc_e2e.rs, docs/superpowers/specs/2026-06-02-global-ptt-design.md, app/src/assets/audio/README.md, src/openhuman/about_app/catalog_data.rs
Adds Playwright-style end‑to‑end PTT flow, JSON‑RPC test for speak‑reply TTS trigger, design spec, audio asset README, and capability catalog entry/test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

agent, working

Suggested reviewers

  • oxoxDev
  • graycyrus
  • sanil-23

Poem

A rabbit hears the quiet key,
Pressed through games and reverie,
With chime and light the session starts,
It hops with code and caring hearts—
Speak softly, human, I'm on thee. 🐇

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. docs Docs-only change; used by PR automation. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (6)
src/openhuman/voice/bus.rs (1)

64-66: ⚡ Quick win

Replace fixed sleep with a bounded receive wait.

The fixed 50ms delay is timing-sensitive and can flake on slower CI. Prefer a timeout-backed receive/poll for deterministic delivery checks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/bus.rs` around lines 64 - 66, Replace the fixed
tokio::time::sleep with a timeout-backed receive: wrap the receiver.recv() call
in tokio::time::timeout(Duration::from_millis(50), receiver.recv()).await,
handle the timeout (Err) path and the recv result (Ok(Some/Ok/Err) depending on
receiver type), and proceed only when the receive completed or timeouts
deterministically; update the code around the broadcaster tick to use
tokio::time::timeout and the appropriate Receiver::recv() (e.g.,
tokio::sync::broadcast::Receiver::recv or mpsc Receiver::recv) instead of
sleeping.
app/src/pages/settings/voice/PttSettingsPanel.tsx (1)

146-146: ⚡ Quick win

Use namespaced debug logging instead of console.debug.

Please switch this to the app’s namespaced debug logger pattern for dev diagnostics.

As per coding guidelines, "In app/src, use namespaced debug logs (e.g. from the debug npm package) with dev-only detail for development diagnostics".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/pages/settings/voice/PttSettingsPanel.tsx` at line 146, Replace the
console.debug call in PttSettingsPanel that logs the captured shortcut with the
app’s namespaced debug logger: add or reuse a debug instance (e.g.
debug('app:pttSettings') or the project's existing namespace) and call that
logger with the same message/args instead of console.debug; ensure the
import/initializer for the debug logger is present in the PttSettingsPanel
module and that the log invocation uses the shortcutString variable.
app/src/components/settings/panels/VoicePanel.tsx (1)

1233-1240: 🏗️ Heavy lift

Split this panel instead of extending a 1.2k-line component.

Line 1233 adds more feature surface to an already oversized module; please extract sections (e.g., provider/routing/PTT composition) into smaller components and keep VoicePanel as orchestration only.

As per coding guidelines: “Keep TypeScript/React files at or below ~500 lines; split growing modules into smaller files.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 1233 - 1240,
VoicePanel is getting too large; extract the big sub-sections (e.g.,
provider/routing, PTT composition, and any internal stateful blocks such as the
PttSettingsPanel logic) into their own components/files so VoicePanel becomes an
orchestration-only component that just composes and passes props. Concretely:
create new components (e.g., PttSettingsPanel, PttProviderControls,
PttRoutingControls) that encapsulate local state, handlers, and UI currently
embedded in VoicePanel; move related helper hooks/utilities alongside them and
export them; update VoicePanel to import and render these new components and
forward any required props or callbacks (keep shared slice interactions with
usePttHotkey and ptt slice usage centralized). Ensure tests/imports are updated
and default exports/named exports match the new files so the behavior is
unchanged while VoicePanel stays under ~500 lines.
app/src/features/voice/pttAudio.ts (1)

54-55: ⚡ Quick win

Use namespaced debug logger instead of console.* in app runtime modules.

Please switch these diagnostics to a namespaced debug(...) logger to match the app logging standard.

As per coding guidelines, "In app/src, use namespaced debug logs (e.g. from the debug npm package) with dev-only detail for development diagnostics".

Also applies to: 70-71, 90-90, 98-99, 112-112

app/src/features/voice/pttChimes.ts (1)

40-40: ⚡ Quick win

Replace console.debug with namespaced debug logging.

Use the shared namespaced debug pattern here as well for consistency with app diagnostics policy.

As per coding guidelines, "Use namespaced debug function with dev-only detail for diagnostic logging in TypeScript/React".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/features/voice/pttChimes.ts` at line 40, Replace the plain
console.debug call in pttChimes.ts with the shared namespaced debug logger used
across the app: import or get the module's debug factory, create a logger for
this module (e.g., const dbg = debug('voice:ptt-chime') or similar existing
naming convention), and replace console.debug('[ptt-chime] play failed', { kind,
err: String(err) }) with dbg('play failed', { kind, err: String(err) }) so
diagnostics follow the app's namespaced/dev-only logging pattern.
app/src/components/PttHotkeyManager.tsx (1)

87-90: ⚡ Quick win

Swap console.* diagnostics to namespaced debug logging.

This component currently mixes runtime diagnostics through console.*; align with the app logging convention.

As per coding guidelines, "In app/src, use namespaced debug logs (e.g. from the debug npm package) with dev-only detail for development diagnostics".

Also applies to: 117-119, 131-131

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/PttHotkeyManager.tsx` around lines 87 - 90, Replace the
raw console calls in the logger object with namespaced debug loggers: import
debug from 'debug' and create namespace(s) like const log =
debug('app:PttHotkeyManager') and optionally const info =
debug('app:PttHotkeyManager:info'), const warn =
debug('app:PttHotkeyManager:warn'); then change the object entries (the
debug/info/warn arrow functions shown in the snippet) to call those debug
instances (e.g. debug(msg, meta) -> log(format message and metadata) or
info(JSON.stringify(meta) when present)) so logs follow the app's dev-only
namespaced convention; keep the same parameter shape (msg, meta) and ensure meta
is omitted or stringified when undefined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/PttHotkeyManager.tsx`:
- Around line 103-110: Listener callbacks in PttHotkeyManager.tsx call
service.onStart and service.onStop without handling rejections, which can create
unhandled promise rejections; update the 'ptt://start' and 'ptt://stop' listen
callbacks to handle promise failures from service.onStart(session_id) and
service.onStop(session_id) (e.g., append .catch(...) or use an async wrapper
with try/catch) and in the catch log the error and/or dispatch an error action
so failures are observed and do not cause unhandled rejections.

In `@app/src/lib/i18n/es.ts`:
- Around line 1558-1559: Update the i18n entry for the 'pttSettings.description'
key so the copy no longer asserts that OpenHuman will always speak replies;
instead indicate that replies are spoken only when text-to-speech is enabled
(e.g., add "si la síntesis de voz está activada" or similar). Locate the
'pttSettings.description' string in the Spanish translations and modify the
sentence to reference the optional TTS behavior so it aligns with the setting
controlled elsewhere.

In `@app/src/lib/i18n/ru.ts`:
- Around line 1536-1537: Update the translation for the key
pttSettings.description so it no longer states that OpenHuman will always speak
replies; make it neutral/conditional and reference the optional behavior
controlled by pttSettings.speakRepliesLabel (e.g., indicate replies may be
spoken if the "speak replies" setting is enabled) so the text accurately
reflects that speaking is optional.

In `@app/src/pages/settings/voice/PttSettingsPanel.tsx`:
- Around line 71-76: The serialization of key labels in PttSettingsPanel uses
e.key which is a single space for the Space key, producing visually blank
bindings; update the logic around label (the variable set from e.key) so if
label === ' ' (or other browser variants if desired) you normalize it to "Space"
before parts.push and returning parts.join('+'), ensuring the Space key is
serialized as "Space" rather than an empty-looking string.
- Around line 118-125: The keydown handler handleShortcutKeyDown currently calls
e.preventDefault()/e.stopPropagation() for every key which blocks Tab/Shift+Tab
focus navigation; update handleShortcutKeyDown in PttSettingsPanel to
early-return (do not call preventDefault/stopPropagation) when the pressed key
is Tab (e.key === 'Tab' or keyCode 9) or when Shift+Tab is detected so native
focus movement still occurs, and keep the existing
preventDefault/stopPropagation behavior for all other keys so they remain
captured as shortcut bindings.

In `@app/src/services/pttService.ts`:
- Around line 135-145: finaliseSession currently invokes
deps.resolveActiveThreadId, deps.createNewVoiceThread and deps.sendMessage
directly, allowing any thrown/rejected promise to escape; wrap the
thread-resolution/creation and sendMessage sequence in a try/catch inside
finaliseSession (around calls to resolveActiveThreadId, createNewVoiceThread and
sendMessage), log or record the caught error and ensure finaliseSession
completes (e.g., still performs session cleanup) rather than rethrowing so no
rejected promise leaks to caller event paths; reference the functions
resolveActiveThreadId, createNewVoiceThread and sendMessage when updating the
implementation.

In `@docs/superpowers/specs/2026-06-02-global-ptt-design.md`:
- Line 296: The locale count in the doc is incorrect: it says "12 non-English
locale files" but the list under the pttSettings / pttOverlay i18n change
includes 13 locales (ar, bn, de, es, fr, hi, id, it, ko, pl, pt, ru, zh-CN);
update the sentence to the correct number (13) or remove the hardcoded number
and say "the following non-English locales" so the text and the pttSettings /
pttOverlay list in app/src/lib/i18n/en.ts remain consistent.
- Around line 102-144: Add documentation for the three new Tauri IPC commands
(register_ptt_hotkey, unregister_ptt_hotkey, show_ptt_overlay) to the Tauri
shell docs: update gitbooks/developing/architecture/tauri-shell.md to list each
command signature, brief behavior summary (including error/return cases like
ConflictsWithDictation for register_ptt_hotkey), and note any lifecycle/cleanup
expectations (PttHotkeyState and ptt overlay creation/destruction); also add a
checklist/task in the spec testing section reminding implementers to update this
documentation when adding IPC commands.

In `@src/openhuman/channels/providers/web.rs`:
- Around line 929-954: The synthesize_reply(...) ReplySpeechResult is being
awaited and then discarded in the Ok branch inside the if let Ok(ref
task_result) = result block (conditioned on metadata.speak_reply); instead
capture the successful result (from
crate::openhuman::voice::reply_speech::synthesize_reply) and propagate it to the
client—either attach it to the outgoing response payload for this request or
emit the appropriate web channel event so the renderer can play TTS. Update the
branch handling the Ok(_) case to store the ReplySpeechResult and include it in
the same response/event that carries task_result.full_response (use the existing
client_id/thread_id/request_id context), and keep the log but no longer drop the
synthesized result; ensure error handling remains in the Err path.

---

Nitpick comments:
In `@app/src/components/PttHotkeyManager.tsx`:
- Around line 87-90: Replace the raw console calls in the logger object with
namespaced debug loggers: import debug from 'debug' and create namespace(s) like
const log = debug('app:PttHotkeyManager') and optionally const info =
debug('app:PttHotkeyManager:info'), const warn =
debug('app:PttHotkeyManager:warn'); then change the object entries (the
debug/info/warn arrow functions shown in the snippet) to call those debug
instances (e.g. debug(msg, meta) -> log(format message and metadata) or
info(JSON.stringify(meta) when present)) so logs follow the app's dev-only
namespaced convention; keep the same parameter shape (msg, meta) and ensure meta
is omitted or stringified when undefined.

In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 1233-1240: VoicePanel is getting too large; extract the big
sub-sections (e.g., provider/routing, PTT composition, and any internal stateful
blocks such as the PttSettingsPanel logic) into their own components/files so
VoicePanel becomes an orchestration-only component that just composes and passes
props. Concretely: create new components (e.g., PttSettingsPanel,
PttProviderControls, PttRoutingControls) that encapsulate local state, handlers,
and UI currently embedded in VoicePanel; move related helper hooks/utilities
alongside them and export them; update VoicePanel to import and render these new
components and forward any required props or callbacks (keep shared slice
interactions with usePttHotkey and ptt slice usage centralized). Ensure
tests/imports are updated and default exports/named exports match the new files
so the behavior is unchanged while VoicePanel stays under ~500 lines.

In `@app/src/features/voice/pttChimes.ts`:
- Line 40: Replace the plain console.debug call in pttChimes.ts with the shared
namespaced debug logger used across the app: import or get the module's debug
factory, create a logger for this module (e.g., const dbg =
debug('voice:ptt-chime') or similar existing naming convention), and replace
console.debug('[ptt-chime] play failed', { kind, err: String(err) }) with
dbg('play failed', { kind, err: String(err) }) so diagnostics follow the app's
namespaced/dev-only logging pattern.

In `@app/src/pages/settings/voice/PttSettingsPanel.tsx`:
- Line 146: Replace the console.debug call in PttSettingsPanel that logs the
captured shortcut with the app’s namespaced debug logger: add or reuse a debug
instance (e.g. debug('app:pttSettings') or the project's existing namespace) and
call that logger with the same message/args instead of console.debug; ensure the
import/initializer for the debug logger is present in the PttSettingsPanel
module and that the log invocation uses the shortcutString variable.

In `@src/openhuman/voice/bus.rs`:
- Around line 64-66: Replace the fixed tokio::time::sleep with a timeout-backed
receive: wrap the receiver.recv() call in
tokio::time::timeout(Duration::from_millis(50), receiver.recv()).await, handle
the timeout (Err) path and the recv result (Ok(Some/Ok/Err) depending on
receiver type), and proceed only when the receive completed or timeouts
deterministically; update the code around the broadcaster tick to use
tokio::time::timeout and the appropriate Receiver::recv() (e.g.,
tokio::sync::broadcast::Receiver::recv or mpsc Receiver::recv) instead of
sleeping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 084ac58f-7897-4bd8-afe0-95263f2d401d

📥 Commits

Reviewing files that changed from the base of the PR and between 87a91ae and 8e5ce8d.

⛔ Files ignored due to path filters (3)
  • app/src/assets/audio/ptt-close.wav is excluded by !**/*.wav
  • app/src/assets/audio/ptt-error.wav is excluded by !**/*.wav
  • app/src/assets/audio/ptt-open.wav is excluded by !**/*.wav
📒 Files selected for processing (65)
  • app/src-tauri/src/lib.rs
  • app/src-tauri/src/ptt_hotkeys.rs
  • app/src-tauri/src/ptt_overlay.rs
  • app/src/App.tsx
  • app/src/AppRoutes.tsx
  • app/src/__tests__/App.boot.test.tsx
  • app/src/assets/audio/README.md
  • app/src/components/PttHotkeyManager.tsx
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/features/voice/pttAudio.ts
  • app/src/features/voice/pttChimes.ts
  • app/src/features/voice/pttThread.ts
  • app/src/features/voice/pttTranscribe.ts
  • app/src/hooks/usePttHotkey.ts
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/pages/PttOverlayPage.test.tsx
  • app/src/pages/PttOverlayPage.tsx
  • app/src/pages/settings/voice/PttSettingsPanel.tsx
  • app/src/pages/settings/voice/__tests__/PttSettingsPanel.test.tsx
  • app/src/services/__tests__/chatService.test.ts
  • app/src/services/__tests__/pttService.test.ts
  • app/src/services/chatService.ts
  • app/src/services/pttService.ts
  • app/src/store/__tests__/pttSlice.test.ts
  • app/src/store/index.ts
  • app/src/store/pttSlice.ts
  • app/src/test/test-utils.tsx
  • app/src/utils/tauriCommands/ptt.ts
  • app/test/e2e/specs/ptt-flow.spec.ts
  • docs/superpowers/plans/2026-06-02-global-ptt.md
  • docs/superpowers/specs/2026-06-02-global-ptt-design.md
  • src/core/event_bus/events.rs
  • src/core/event_bus/mod.rs
  • src/core/socketio.rs
  • src/openhuman/about_app/catalog_data.rs
  • src/openhuman/about_app/catalog_tests.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs
  • src/openhuman/voice/bus.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/reply_speech.rs
  • tests/channels_large_round25_raw_coverage_e2e.rs
  • tests/channels_provider_deep_raw_coverage_e2e.rs
  • tests/channels_provider_leftovers_raw_coverage_e2e.rs
  • tests/channels_runtime_raw_coverage_e2e.rs
  • tests/channels_web_startup_raw_coverage_e2e.rs
  • tests/channels_web_telegram_raw_coverage_e2e.rs
  • tests/channels_web_yuanbao_round22_raw_coverage_e2e.rs
  • tests/json_rpc_e2e.rs
  • tests/tools_approval_channels_raw_coverage_e2e.rs
  • tests/tools_network_channels_raw_coverage_e2e.rs
👮 Files not reviewed due to content moderation or server errors (7)
  • app/src-tauri/src/ptt_hotkeys.rs
  • app/src-tauri/src/ptt_overlay.rs
  • app/src-tauri/src/lib.rs
  • app/src/hooks/usePttHotkey.ts
  • app/src/features/voice/pttThread.ts
  • app/src/utils/tauriCommands/ptt.ts
  • app/test/e2e/specs/ptt-flow.spec.ts

Comment on lines +103 to +110
const offStart = await listen<PttEventPayload>('ptt://start', e => {
dispatch(setIsHeld(true));
void service.onStart(e.payload.session_id);
});
const offStop = await listen<PttEventPayload>('ptt://stop', e => {
dispatch(setIsHeld(false));
void service.onStop(e.payload.session_id);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle rejected promises from event-edge service calls.

The listener callbacks fire onStart/onStop without a .catch, so failures can become unhandled promise rejections on runtime event edges.

Suggested fix
         const offStart = await listen<PttEventPayload>('ptt://start', e => {
           dispatch(setIsHeld(true));
-          void service.onStart(e.payload.session_id);
+          void service.onStart(e.payload.session_id).catch(err => {
+            console.warn('[ptt] onStart failed', err);
+          });
         });
         const offStop = await listen<PttEventPayload>('ptt://stop', e => {
           dispatch(setIsHeld(false));
-          void service.onStop(e.payload.session_id);
+          void service.onStop(e.payload.session_id).catch(err => {
+            console.warn('[ptt] onStop failed', err);
+          });
         });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const offStart = await listen<PttEventPayload>('ptt://start', e => {
dispatch(setIsHeld(true));
void service.onStart(e.payload.session_id);
});
const offStop = await listen<PttEventPayload>('ptt://stop', e => {
dispatch(setIsHeld(false));
void service.onStop(e.payload.session_id);
});
const offStart = await listen<PttEventPayload>('ptt://start', e => {
dispatch(setIsHeld(true));
void service.onStart(e.payload.session_id).catch(err => {
console.warn('[ptt] onStart failed', err);
});
});
const offStop = await listen<PttEventPayload>('ptt://stop', e => {
dispatch(setIsHeld(false));
void service.onStop(e.payload.session_id).catch(err => {
console.warn('[ptt] onStop failed', err);
});
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/PttHotkeyManager.tsx` around lines 103 - 110, Listener
callbacks in PttHotkeyManager.tsx call service.onStart and service.onStop
without handling rejections, which can create unhandled promise rejections;
update the 'ptt://start' and 'ptt://stop' listen callbacks to handle promise
failures from service.onStart(session_id) and service.onStop(session_id) (e.g.,
append .catch(...) or use an async wrapper with try/catch) and in the catch log
the error and/or dispatch an error action so failures are observed and do not
cause unhandled rejections.

Comment thread app/src/lib/i18n/es.ts
Comment on lines +1558 to +1559
'pttSettings.description':
'Mantén pulsada una tecla para hablar con OpenHuman mientras estás en otra aplicación. Al soltarla se envía la grabación; OpenHuman dice la respuesta en voz alta.',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align PTT description with optional TTS behavior.

The current copy says OpenHuman will speak replies after release, but Line 1564 makes that behavior optional. This can mislead users about default behavior.

✏️ Suggested copy adjustment
-  'pttSettings.description':
-    'Mantén pulsada una tecla para hablar con OpenHuman mientras estás en otra aplicación. Al soltarla se envía la grabación; OpenHuman dice la respuesta en voz alta.',
+  'pttSettings.description':
+    'Mantén pulsada una tecla para hablar con OpenHuman mientras estás en otra aplicación. Al soltarla se envía la grabación; puedes elegir si OpenHuman responde en voz alta.',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'pttSettings.description':
'Mantén pulsada una tecla para hablar con OpenHuman mientras estás en otra aplicación. Al soltarla se envía la grabación; OpenHuman dice la respuesta en voz alta.',
'pttSettings.description':
'Mantén pulsada una tecla para hablar con OpenHuman mientras estás en otra aplicación. Al soltarla se envía la grabación; puedes elegir si OpenHuman responde en voz alta.',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/lib/i18n/es.ts` around lines 1558 - 1559, Update the i18n entry for
the 'pttSettings.description' key so the copy no longer asserts that OpenHuman
will always speak replies; instead indicate that replies are spoken only when
text-to-speech is enabled (e.g., add "si la síntesis de voz está activada" or
similar). Locate the 'pttSettings.description' string in the Spanish
translations and modify the sentence to reference the optional TTS behavior so
it aligns with the setting controlled elsewhere.

Comment thread app/src/lib/i18n/ru.ts
Comment on lines +1536 to +1537
'pttSettings.description':
'Удерживайте клавишу, чтобы говорить с OpenHuman, пока вы находитесь в другом приложении. При отпускании запись отправляется; OpenHuman озвучит ответ.',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make PTT description consistent with optional TTS behavior

Line 1537 says OpenHuman will speak the reply unconditionally, but pttSettings.speakRepliesLabel makes that behavior optional. Please make the description conditional/neutral to avoid misleading users.

Suggested copy tweak
-  'pttSettings.description':
-    'Удерживайте клавишу, чтобы говорить с OpenHuman, пока вы находитесь в другом приложении. При отпускании запись отправляется; OpenHuman озвучит ответ.',
+  'pttSettings.description':
+    'Удерживайте клавишу, чтобы говорить с OpenHuman, пока вы находитесь в другом приложении. При отпускании запись отправляется; при включенной озвучке ответ будет воспроизведён.',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'pttSettings.description':
'Удерживайте клавишу, чтобы говорить с OpenHuman, пока вы находитесь в другом приложении. При отпускании запись отправляется; OpenHuman озвучит ответ.',
'pttSettings.description':
'Удерживайте клавишу, чтобы говорить с OpenHuman, пока вы находитесь в другом приложении. При отпускании запись отправляется; при включенной озвучке ответ будет воспроизведён.',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/lib/i18n/ru.ts` around lines 1536 - 1537, Update the translation for
the key pttSettings.description so it no longer states that OpenHuman will
always speak replies; make it neutral/conditional and reference the optional
behavior controlled by pttSettings.speakRepliesLabel (e.g., indicate replies may
be spoken if the "speak replies" setting is enabled) so the text accurately
reflects that speaking is optional.

Comment on lines +71 to +76
let label = e.key;
if (label.length === 1 && /[a-z]/.test(label)) {
label = label.toUpperCase();
}
parts.push(label);
return parts.join('+');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize Space key label before serialization.

When users press Space, e.key is " ", so the saved binding can become visually blank (e.g. Ctrl+ ). Normalize it to Space before joining.

Suggested fix
   let label = e.key;
+  if (label === ' ') {
+    label = 'Space';
+  }

   if (label.length === 1 && /[a-z]/.test(label)) {
     label = label.toUpperCase();
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let label = e.key;
if (label.length === 1 && /[a-z]/.test(label)) {
label = label.toUpperCase();
}
parts.push(label);
return parts.join('+');
let label = e.key;
if (label === ' ') {
label = 'Space';
}
if (label.length === 1 && /[a-z]/.test(label)) {
label = label.toUpperCase();
}
parts.push(label);
return parts.join('+');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/pages/settings/voice/PttSettingsPanel.tsx` around lines 71 - 76, The
serialization of key labels in PttSettingsPanel uses e.key which is a single
space for the Space key, producing visually blank bindings; update the logic
around label (the variable set from e.key) so if label === ' ' (or other browser
variants if desired) you normalize it to "Space" before parts.push and returning
parts.join('+'), ensuring the Space key is serialized as "Space" rather than an
empty-looking string.

Comment on lines +118 to +125
const handleShortcutKeyDown = useCallback(
(e: React.KeyboardEvent<HTMLInputElement>) => {
// Always preventDefault so the input doesn't try to insert text
// for the captured character — we treat it as a binding press,
// not editable content.
e.preventDefault();
e.stopPropagation();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow Tab in hotkey capture.

Line 123 prevents default on every keydown, so Tab/Shift+Tab can’t move focus and gets captured as a shortcut instead. That’s a keyboard-navigation blocker.

Suggested fix
   const handleShortcutKeyDown = useCallback(
     (e: React.KeyboardEvent<HTMLInputElement>) => {
+      if (e.key === 'Tab') {
+        // Preserve keyboard navigation.
+        return;
+      }
       // Always preventDefault so the input doesn't try to insert text
       // for the captured character — we treat it as a binding press,
       // not editable content.
       e.preventDefault();
       e.stopPropagation();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleShortcutKeyDown = useCallback(
(e: React.KeyboardEvent<HTMLInputElement>) => {
// Always preventDefault so the input doesn't try to insert text
// for the captured character — we treat it as a binding press,
// not editable content.
e.preventDefault();
e.stopPropagation();
const handleShortcutKeyDown = useCallback(
(e: React.KeyboardEvent<HTMLInputElement>) => {
if (e.key === 'Tab') {
// Preserve keyboard navigation.
return;
}
// Always preventDefault so the input doesn't try to insert text
// for the captured character — we treat it as a binding press,
// not editable content.
e.preventDefault();
e.stopPropagation();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/pages/settings/voice/PttSettingsPanel.tsx` around lines 118 - 125,
The keydown handler handleShortcutKeyDown currently calls
e.preventDefault()/e.stopPropagation() for every key which blocks Tab/Shift+Tab
focus navigation; update handleShortcutKeyDown in PttSettingsPanel to
early-return (do not call preventDefault/stopPropagation) when the pressed key
is Tab (e.key === 'Tab' or keyCode 9) or when Shift+Tab is detected so native
focus movement still occurs, and keep the existing
preventDefault/stopPropagation behavior for all other keys so they remain
captured as shortcut bindings.

Comment on lines +135 to +145
let threadId = await deps.resolveActiveThreadId();
if (!threadId) {
threadId = await deps.createNewVoiceThread();
}

await deps.sendMessage({
threadId,
body: trimmed,
metadata: { source: 'ptt', session_id: sessionId },
speakReply: settings.speakReplies,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Catch commit-path failures inside session finalization.

resolveActiveThreadId / createNewVoiceThread / sendMessage can throw and currently escape finaliseSession, which can leak rejected promises into caller event paths.

Suggested fix
-    let threadId = await deps.resolveActiveThreadId();
-    if (!threadId) {
-      threadId = await deps.createNewVoiceThread();
-    }
-
-    await deps.sendMessage({
-      threadId,
-      body: trimmed,
-      metadata: { source: 'ptt', session_id: sessionId },
-      speakReply: settings.speakReplies,
-    });
+    try {
+      let threadId = await deps.resolveActiveThreadId();
+      if (!threadId) {
+        threadId = await deps.createNewVoiceThread();
+      }
+
+      await deps.sendMessage({
+        threadId,
+        body: trimmed,
+        metadata: { source: 'ptt', session_id: sessionId },
+        speakReply: settings.speakReplies,
+      });
+    } catch (err) {
+      deps.logger.warn('[ptt] commit failed', { sessionId, err: String(err) });
+      await deps.playChime('error');
+      return;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let threadId = await deps.resolveActiveThreadId();
if (!threadId) {
threadId = await deps.createNewVoiceThread();
}
await deps.sendMessage({
threadId,
body: trimmed,
metadata: { source: 'ptt', session_id: sessionId },
speakReply: settings.speakReplies,
});
try {
let threadId = await deps.resolveActiveThreadId();
if (!threadId) {
threadId = await deps.createNewVoiceThread();
}
await deps.sendMessage({
threadId,
body: trimmed,
metadata: { source: 'ppt', session_id: sessionId },
speakReply: settings.speakReplies,
});
} catch (err) {
deps.logger.warn('[ppt] commit failed', { sessionId, err: String(err) });
await deps.playChime('error');
return;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/services/pttService.ts` around lines 135 - 145, finaliseSession
currently invokes deps.resolveActiveThreadId, deps.createNewVoiceThread and
deps.sendMessage directly, allowing any thrown/rejected promise to escape; wrap
the thread-resolution/creation and sendMessage sequence in a try/catch inside
finaliseSession (around calls to resolveActiveThreadId, createNewVoiceThread and
sendMessage), log or record the caught error and ensure finaliseSession
completes (e.g., still performs session cleanup) rather than rethrowing so no
rejected promise leaks to caller event paths; reference the functions
resolveActiveThreadId, createNewVoiceThread and sendMessage when updating the
implementation.

Comment on lines +102 to +144
#### `lib.rs` — two new IPC commands

```rust
#[tauri::command]
async fn register_ptt_hotkey(app: AppHandle<AppRuntime>, shortcut: String) -> Result<(), String>;

#[tauri::command]
async fn unregister_ptt_hotkey(app: AppHandle<AppRuntime>) -> Result<(), String>;
```

Behavior on `register_ptt_hotkey`:

1. Expand & validate via `expand_ptt_shortcuts`.
2. Check overlap with the currently-registered dictation shortcut(s); on overlap return `ConflictsWithDictation`.
3. Unregister any previously-registered PTT shortcut (rollback-safe — same pattern as the dictation registration).
4. Register each expanded variant with a closure that:
- On `Pressed`: CAS `is_held: false → true`; on success, increment `session_counter` and emit `ptt://start { session_id }`. On failure (CAS lost — auto-repeat or stuck state), drop.
- On `Released`: CAS `is_held: true → false`; on success, emit `ptt://stop { session_id }` with the *current* counter value. On failure, drop.
5. Persist the registered variants in `PttHotkeyState`.

`unregister_ptt_hotkey` unregisters all currently-registered variants and clears state. Also called on shutdown (`unregister_all` already covered by the plugin's drop).

#### `ptt_overlay.rs` *(new)* — dedicated overlay window

Lazy-create-on-first-register, destroyed on `unregister`. Window config:

| Field | Value |
| --- | --- |
| `label` | `"ptt-overlay"` |
| `url` | `/#/ptt-overlay` (HashRouter route, mounted only in this window) |
| `decorations` | `false` |
| `transparent` | `true` |
| `always_on_top` | `true` |
| `skip_taskbar` | `true` |
| `focus` | `false` (never accepts focus) |
| `resizable` | `false` |
| `shadow` | `false` |
| `visible_on_all_workspaces` | `true` |
| `accept_first_mouse` | `false` |
| `size` | `160 × 56` |
| `position` | bottom-right of primary display, 24px inset (hard-coded in v1) |

IPC command: `show_ptt_overlay({ active: bool, session_id: u64 })` — hides/shows the window with a 250ms fade on close. Window-local React state in `/#/ptt-overlay` toggles a pulsing red dot when `active: true`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Document new Tauri IPC commands in the Tauri shell architecture docs.

The spec introduces three new Tauri IPC commands (register_ptt_hotkey, unregister_ptt_hotkey, show_ptt_overlay), but does not mention updating gitbooks/developing/architecture/tauri-shell.md. As per coding guidelines, registered IPC commands in the Tauri shell should be documented there.

Consider adding a note in the spec (or an explicit task in the testing/checklist section) to update the Tauri shell command documentation alongside the implementation.

As per coding guidelines: Registered IPC commands in the Tauri shell should be documented in gitbooks/developing/architecture/tauri-shell.md.

🧰 Tools
🪛 LanguageTool

[style] ~119-~119: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...id }` with the current counter value. On failure, drop. 5. Persist the registere...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-06-02-global-ptt-design.md` around lines 102 -
144, Add documentation for the three new Tauri IPC commands
(register_ptt_hotkey, unregister_ptt_hotkey, show_ptt_overlay) to the Tauri
shell docs: update gitbooks/developing/architecture/tauri-shell.md to list each
command signature, brief behavior summary (including error/return cases like
ConflictsWithDictation for register_ptt_hotkey), and note any lifecycle/cleanup
expectations (PttHotkeyState and ptt overlay creation/destruction); also add a
checklist/task in the spec testing section reminding implementers to update this
documentation when adding IPC commands.


#### i18n

New keys under a `pttSettings` / `pttOverlay` namespace in `app/src/lib/i18n/en.ts`, real translations added to all 12 non-English locale files (`ar`, `bn`, `de`, `es`, `fr`, `hi`, `id`, `it`, `ko`, `pl`, `pt`, `ru`, `zh-CN`). `pnpm i18n:check` and `pnpm i18n:english:check` gate this.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Locale count mismatch.

The text states "12 non-English locale files" but the list contains 13 locales: ar, bn, de, es, fr, hi, id, it, ko, pl, pt, ru, zh-CN.

📝 Suggested fix
-New keys under a `pttSettings` / `pttOverlay` namespace in `app/src/lib/i18n/en.ts`, real translations added to all 12 non-English locale files (`ar`, `bn`, `de`, `es`, `fr`, `hi`, `id`, `it`, `ko`, `pl`, `pt`, `ru`, `zh-CN`). `pnpm i18n:check` and `pnpm i18n:english:check` gate this.
+New keys under a `pttSettings` / `pttOverlay` namespace in `app/src/lib/i18n/en.ts`, real translations added to all 13 non-English locale files (`ar`, `bn`, `de`, `es`, `fr`, `hi`, `id`, `it`, `ko`, `pl`, `pt`, `ru`, `zh-CN`). `pnpm i18n:check` and `pnpm i18n:english:check` gate this.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
New keys under a `pttSettings` / `pttOverlay` namespace in `app/src/lib/i18n/en.ts`, real translations added to all 12 non-English locale files (`ar`, `bn`, `de`, `es`, `fr`, `hi`, `id`, `it`, `ko`, `pl`, `pt`, `ru`, `zh-CN`). `pnpm i18n:check` and `pnpm i18n:english:check` gate this.
New keys under a `pttSettings` / `pttOverlay` namespace in `app/src/lib/i18n/en.ts`, real translations added to all 13 non-English locale files (`ar`, `bn`, `de`, `es`, `fr`, `hi`, `id`, `it`, `ko`, `pl`, `pt`, `ru`, `zh-CN`). `pnpm i18n:check` and `pnpm i18n:english:check` gate this.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-06-02-global-ptt-design.md` at line 296, The
locale count in the doc is incorrect: it says "12 non-English locale files" but
the list under the pttSettings / pttOverlay i18n change includes 13 locales (ar,
bn, de, es, fr, hi, id, it, ko, pl, pt, ru, zh-CN); update the sentence to the
correct number (13) or remove the hardcoded number and say "the following
non-English locales" so the text and the pttSettings / pttOverlay list in
app/src/lib/i18n/en.ts remain consistent.

Comment on lines +929 to +954
if let Ok(ref task_result) = result {
let speak_reply = matches!(metadata.speak_reply, Some(true));
let trimmed_response = task_result.full_response.trim();
if speak_reply && !trimmed_response.is_empty() {
let opts = crate::openhuman::voice::reply_speech::ReplySpeechOptions::default();
match crate::openhuman::voice::reply_speech::synthesize_reply(
&config,
&task_result.full_response,
&opts,
)
.await
{
Ok(_) => log::debug!(
"[web_channel] reply_speech dispatched chars={} client_id={} thread_id={} request_id={}",
task_result.full_response.len(),
client_id,
thread_id,
request_id,
),
Err(err) => log::warn!(
"[web_channel] reply_speech failed: {err} client_id={} thread_id={} request_id={}",
client_id,
thread_id,
request_id,
),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

speak_reply synthesis result is dropped before it can be consumed.

On Line 915 the comment says this path lets the renderer play TTS, but the synthesize_reply(...) result is only logged and discarded. No event/response path carries ReplySpeechResult to the client, so playback cannot occur from this branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/providers/web.rs` around lines 929 - 954, The
synthesize_reply(...) ReplySpeechResult is being awaited and then discarded in
the Ok branch inside the if let Ok(ref task_result) = result block (conditioned
on metadata.speak_reply); instead capture the successful result (from
crate::openhuman::voice::reply_speech::synthesize_reply) and propagate it to the
client—either attach it to the outgoing response payload for this request or
emit the appropriate web channel event so the renderer can play TTS. Update the
branch handling the Ok(_) case to store the ReplySpeechResult and include it in
the same response/event that carries task_result.full_response (use the existing
client_id/thread_id/request_id context), and keep the log but no longer drop the
synthesized result; ensure error handling remains in the Err path.

Resolves 18 conflicts from upstream commits c3b9b2d (active-run steering),
db0307d (ArtifactPending), 3a7180d (explicit turn origin + fail-closed
approval gate), ed3c6d7 (backend_meet), 71e04ea + 307c8e2 (artifact
panel + ArtifactCard).

PTT-side changes preserved: ChatRequestMetadata threading through
channel_web_chat -> start_chat -> spawn_progress_bridge, VoiceEvent +
DomainEvent::Voice, reply_speech invocation on speak_reply, test seam
short-circuit + E2E test, frontend chatService forwarding,
pttSlice + persistence.

start_chat signature: queue_mode comes BEFORE metadata in the merged
trailing args. dispatch_followups call site also updated to pass
ChatRequestMetadata::default(). channel_web_chat schema gains all five
new fields (speak_reply, source, session_id, queue_mode). WebChatParams
gains queue_mode alongside the PTT trio.

Verified: cargo check (lib + tests + tauri shell) clean, pnpm compile
clean, json_rpc_channel_web_chat_with_speak_reply_invokes_reply_speech
passes, voice::bus::tests passes, full json_rpc_e2e: 78 pass + 1
pre-existing port-conflict flake unrelated to merge.
@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team. labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. docs Docs-only change; used by PR automation. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Global push-to-talk keybind + screen share while tabbed out / in background

1 participant