Context
Companion to #357. Mobile app v0.4.0 ships with full-duplex voice support including barge-in — if the user starts speaking while the agent is mid-response, the phone stops the TTS locally and sends an interrupt signal upstream so the backend halts any in-progress generation for the current turn.
Requested change
companion-ws should accept upstream messages of the form:
{ "type": "voice:interrupt" }
…and halt whatever's currently being generated for that turn, so the mobile doesn't keep receiving speech downstream chunks the user no longer wants to hear. The follow-up voice:transcript (the user's new utterance) will arrive shortly after and should be treated as a fresh turn.
Acceptance criteria
Why it matters
Without server-side interrupt handling, the agent keeps generating speech chunks after the user barges in. The phone silences them locally (we already stop the TTS playback), but the backend wastes tokens, the user's next utterance gets queued behind the obsolete response, and the conversation drifts out of sync.
Related
- Mobile-side spec:
backlog/F007-talk-to-agent-via-voice.md in 23blocks/ai-maestro-app
- Companion: #357 (voice:transcript upstream)
- Mobile sends both messages through
hooks/useCompanionWS.ts (sendVoiceInterrupt + sendVoiceTranscript)
Context
Companion to #357. Mobile app v0.4.0 ships with full-duplex voice support including barge-in — if the user starts speaking while the agent is mid-response, the phone stops the TTS locally and sends an interrupt signal upstream so the backend halts any in-progress generation for the current turn.
Requested change
companion-wsshould accept upstream messages of the form:{ "type": "voice:interrupt" }…and halt whatever's currently being generated for that turn, so the mobile doesn't keep receiving
speechdownstream chunks the user no longer wants to hear. The follow-upvoice:transcript(the user's new utterance) will arrive shortly after and should be treated as a fresh turn.Acceptance criteria
companion-wsparses incoming JSON withtype === 'voice:interrupt'speechdownstream messages)speechmessages for that turn flow downstream after the interrupt arrivesvoice:transcriptmessages are processed as new turns (no leftover state from the cancelled one)voice:interruptarrives, the message is a no-op (log + drop)Why it matters
Without server-side interrupt handling, the agent keeps generating speech chunks after the user barges in. The phone silences them locally (we already stop the TTS playback), but the backend wastes tokens, the user's next utterance gets queued behind the obsolete response, and the conversation drifts out of sync.
Related
backlog/F007-talk-to-agent-via-voice.mdin23blocks/ai-maestro-apphooks/useCompanionWS.ts(sendVoiceInterrupt+sendVoiceTranscript)