fix(tts): instant pause via cancel+re-speak and resume-from-position via onboundary by YizukiAme · Pull Request #254 · THU-MAIC/OpenMAIC

YizukiAme · 2026-03-24T17:16:29Z

Summary

Fixes #249 — Browser TTS pause has ~300ms delay due to speechSynthesis.pause() buffering.
Fixes #250 — Lecture mode TTS resume restarts from sentence beginning instead of continuing from pause point.

Note

This PR depends on PR #253 (fix/discussion-tts-pause) which adds browser TTS pause/resume delegation in useDiscussionTTS. Merge #253 first.

Approach: `cancel()` + `onboundary` Word Position Tracking

Both issues stem from Web Speech API limitations. The unified fix:

Instant pause: Use speechSynthesis.cancel() instead of .pause() — cancel is immediate, no buffering delay.
Resume from position: Track word boundaries via utterance.onboundary events (charIndex + charLength). On resume, text.slice(lastBoundaryIndex) re-speaks from where the user paused.

Changes

`lib/hooks/use-browser-tts.ts` (discussion mode, #249)

speakInternal() — shared by speak() and resume(), attaches onboundary handler
pause() — sets cancellingForPauseRef = true, calls cancel(), keeps flag true during entire pause period (Chrome fires onEnd asynchronously)
resume() — slices text from last boundary, re-speaks with same voice via speakInternal()
cancellingForPauseRef reset moved to speakInternal() to prevent race condition with async onEnd

`lib/playback/engine.ts` (lecture mode, #250)

browserTTSBoundaryIndex — tracks charIndex + charLength within current chunk via onboundary
pause() — slices current chunk from boundary position (not chunk[0])
browserTTSCurrentLang — saves resolved language to prevent voice switching at language boundaries on resume
Reset boundary index and lang on new playback, cleanup, and stop

Testing

Scenario	Before	After
Discussion: pause	~300ms audio tail	Instant stop
Discussion: resume	From sentence start	From last word boundary
Lecture: pause	Instant (already used cancel)	Same
Lecture: resume	From chunk start	From last word boundary
Pause at language boundary	N/A	Same voice/language preserved

🤖 AI-assisted and reviewed by Claude Opus 4.6

- Add currentProviderRef to track active TTS provider - Destructure pause/resume from useBrowserTTS - Delegate pause() to browserPause or audioRef.pause() based on provider - Delegate resume() to browserResume, audioRef.play(), or processQueue - Guard processQueue and onEnd against queue advancement while paused - Reset currentProviderRef in cleanup() Closes THU-MAIC#245

…via onboundary useBrowserTTS (discussion mode, THU-MAIC#249): - Replace speechSynthesis.pause() with cancel() for instant silence - Track word positions via utterance.onboundary events (charIndex + charLength) - Resume re-speaks text.slice(lastBoundaryIndex) with same voice - cancellingForPauseRef stays true during pause, reset in speakInternal() PlaybackEngine (lecture mode, THU-MAIC#250): - Add browserTTSBoundaryIndex field tracking onboundary charIndex - On pause, slice current chunk from boundary position (end of last word) - Save resolved language to prevent voice switching at language boundaries - Resume plays remaining text from pause point, not chunk start Closes THU-MAIC#249, closes THU-MAIC#250

wyuc · 2026-03-25T04:31:01Z

Thanks for the PR! I tested this locally and the cancel+re-speak approach does eliminate the ~300ms pause delay, but I ran into a couple of issues:

Resume position is imprecise on rapid pause/resume — onboundary only fires at word boundaries, so the resume point can jump back noticeably
Punctuation gets spoken aloud — after text.slice(boundaryIndex), the remaining text can start with punctuation, causing TTS to read out "句号", "逗号", etc.
Duplicated pattern — the cancel+onboundary logic is implemented independently in both use-browser-tts.ts and engine.ts

The root cause is that Web Speech API doesn't expose precise audio position, so cancel+re-speak is inherently approximate. The original pause()/resume() has the ~300ms tail but gives exact resume position and handles rapid toggling correctly.

These are limitations of the Web Speech API itself, so no easy fix — let's leave this for now and revisit later. Appreciate the contribution!

YizukiAme added 2 commits March 25, 2026 11:56

YizukiAme force-pushed the fix/tts-instant-pause-resume branch from 255de0c to 6e78b47 Compare March 25, 2026 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): instant pause via cancel+re-speak and resume-from-position via onboundary#254

fix(tts): instant pause via cancel+re-speak and resume-from-position via onboundary#254
YizukiAme wants to merge 2 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/tts-instant-pause-resume

YizukiAme commented Mar 24, 2026 •

edited

Loading

Uh oh!

wyuc commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YizukiAme commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach: cancel() + onboundary Word Position Tracking

Changes

lib/hooks/use-browser-tts.ts (discussion mode, #249)

lib/playback/engine.ts (lecture mode, #250)

Testing

Uh oh!

wyuc commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YizukiAme commented Mar 24, 2026 •

edited

Loading

Approach: `cancel()` + `onboundary` Word Position Tracking

`lib/hooks/use-browser-tts.ts` (discussion mode, #249)

`lib/playback/engine.ts` (lecture mode, #250)