Skip to content

fix(tts): instant pause via cancel+re-speak and resume-from-position via onboundary#254

Open
YizukiAme wants to merge 2 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/tts-instant-pause-resume
Open

fix(tts): instant pause via cancel+re-speak and resume-from-position via onboundary#254
YizukiAme wants to merge 2 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/tts-instant-pause-resume

Conversation

@YizukiAme
Copy link
Contributor

@YizukiAme YizukiAme commented Mar 24, 2026

Summary

Fixes #249 — Browser TTS pause has ~300ms delay due to speechSynthesis.pause() buffering.
Fixes #250 — Lecture mode TTS resume restarts from sentence beginning instead of continuing from pause point.

Note

This PR depends on PR #253 (fix/discussion-tts-pause) which adds browser TTS pause/resume delegation in useDiscussionTTS. Merge #253 first.

Approach: cancel() + onboundary Word Position Tracking

Both issues stem from Web Speech API limitations. The unified fix:

  • Instant pause: Use speechSynthesis.cancel() instead of .pause() — cancel is immediate, no buffering delay.
  • Resume from position: Track word boundaries via utterance.onboundary events (charIndex + charLength). On resume, text.slice(lastBoundaryIndex) re-speaks from where the user paused.

Changes

lib/hooks/use-browser-tts.ts (discussion mode, #249)

  • speakInternal() — shared by speak() and resume(), attaches onboundary handler
  • pause() — sets cancellingForPauseRef = true, calls cancel(), keeps flag true during entire pause period (Chrome fires onEnd asynchronously)
  • resume() — slices text from last boundary, re-speaks with same voice via speakInternal()
  • cancellingForPauseRef reset moved to speakInternal() to prevent race condition with async onEnd

lib/playback/engine.ts (lecture mode, #250)

  • browserTTSBoundaryIndex — tracks charIndex + charLength within current chunk via onboundary
  • pause() — slices current chunk from boundary position (not chunk[0])
  • browserTTSCurrentLang — saves resolved language to prevent voice switching at language boundaries on resume
  • Reset boundary index and lang on new playback, cleanup, and stop

Testing

Scenario Before After
Discussion: pause ~300ms audio tail Instant stop
Discussion: resume From sentence start From last word boundary
Lecture: pause Instant (already used cancel) Same
Lecture: resume From chunk start From last word boundary
Pause at language boundary N/A Same voice/language preserved

🤖 AI-assisted and reviewed by Claude Opus 4.6

- Add currentProviderRef to track active TTS provider
- Destructure pause/resume from useBrowserTTS
- Delegate pause() to browserPause or audioRef.pause() based on provider
- Delegate resume() to browserResume, audioRef.play(), or processQueue
- Guard processQueue and onEnd against queue advancement while paused
- Reset currentProviderRef in cleanup()

Closes THU-MAIC#245
…via onboundary

useBrowserTTS (discussion mode, THU-MAIC#249):
- Replace speechSynthesis.pause() with cancel() for instant silence
- Track word positions via utterance.onboundary events (charIndex + charLength)
- Resume re-speaks text.slice(lastBoundaryIndex) with same voice
- cancellingForPauseRef stays true during pause, reset in speakInternal()

PlaybackEngine (lecture mode, THU-MAIC#250):
- Add browserTTSBoundaryIndex field tracking onboundary charIndex
- On pause, slice current chunk from boundary position (end of last word)
- Save resolved language to prevent voice switching at language boundaries
- Resume plays remaining text from pause point, not chunk start

Closes THU-MAIC#249, closes THU-MAIC#250
@YizukiAme YizukiAme force-pushed the fix/tts-instant-pause-resume branch from 255de0c to 6e78b47 Compare March 25, 2026 03:56
@wyuc
Copy link
Contributor

wyuc commented Mar 25, 2026

Thanks for the PR! I tested this locally and the cancel+re-speak approach does eliminate the ~300ms pause delay, but I ran into a couple of issues:

  1. Resume position is imprecise on rapid pause/resumeonboundary only fires at word boundaries, so the resume point can jump back noticeably
  2. Punctuation gets spoken aloud — after text.slice(boundaryIndex), the remaining text can start with punctuation, causing TTS to read out "句号", "逗号", etc.
  3. Duplicated pattern — the cancel+onboundary logic is implemented independently in both use-browser-tts.ts and engine.ts

The root cause is that Web Speech API doesn't expose precise audio position, so cancel+re-speak is inherently approximate. The original pause()/resume() has the ~300ms tail but gives exact resume position and handles rapid toggling correctly.

These are limitations of the Web Speech API itself, so no easy fix — let's leave this for now and revisit later. Appreciate the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants