Feat. Text Streaming#50
Conversation
There was a problem hiding this comment.
Summary
This implementation plan for real-time partial transcription streaming is well-structured but contains 7 critical defects that would cause crashes, race conditions, and resource leaks if implemented as written.
Critical Issues Found:
- Concurrency safety violation in partial callback with
startTimecapture - Resource leak - callback not explicitly cleaned up on termination
- Infinite recursion in default protocol extension design
- Race condition in
partialTranscriptionTextupdates across actors - Missing nil-safety for
finalResultafter loop exit - Inconsistent cleanup - partial text cleared in multiple locations without guarantees
- Incomplete mitigation - callback cleanup left as "verify" rather than required
All identified issues are blocking defects that must be fixed before implementation. The design is sound but the pseudocode examples contain logic errors that would prevent correct functionality.
Recommendation:
Address all 7 comments before proceeding with implementation. Focus particularly on the concurrency model (issues #1, #4) and resource cleanup (issues #2, #6, #7) as these will cause the most severe runtime failures.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
| await manager.setPartialCallback { partialText in | ||
| let trimmed = partialText.trimmingCharacters(in: .whitespacesAndNewlines) | ||
| guard !trimmed.isEmpty else { return } | ||
| continuation.yield(TranscriptionResult( | ||
| text: trimmed, | ||
| detectedLanguage: nil, | ||
| duration: Date().timeIntervalSince(startTime), | ||
| isPartial: true | ||
| )) | ||
| } |
There was a problem hiding this comment.
🛑 Concurrency Safety Issue: The partial callback captures startTime and creates Date() on potentially different threads. The callback fires on a background thread while accessing startTime from the outer scope. This creates a data race if startTime is modified between callback registration and execution. Capture startTime explicitly in the callback closure or ensure thread-safe access.
| continuation.onTermination = { _ in task.cancel() } | ||
| return stream |
There was a problem hiding this comment.
🛑 Resource Leak Risk: The partial callback is registered but there's no explicit cleanup mechanism. If the task is cancelled or fails before the callback completes, the callback may continue firing and attempting to yield results to a finished continuation, causing crashes. Add explicit callback cleanup in the termination handler: continuation.onTermination = { _ in task.cancel(); await? manager.setPartialCallback(nil) }
| extension TranscriptionEngine { | ||
| func transcribeStream( | ||
| _ audioStream: AsyncStream<[Float]>, | ||
| language: TranscriptionLanguage | ||
| ) async -> AsyncThrowingStream<TranscriptionResult, Error> { | ||
| await transcribeStream(audioStream, language: language, emitPartialResults: false) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛑 Logic Error: The default extension creates infinite recursion. When calling transcribeStream(audioStream, language: language, emitPartialResults: false), if no conforming type implements the 3-parameter version, it will call itself indefinitely. The protocol should require the 3-parameter version, or the extension must call a differently-named internal method to avoid recursion.
| if result.isPartial { | ||
| // Update overlay text — this is the "ghost text" | ||
| self.partialTranscriptionText = result.text | ||
| continue |
There was a problem hiding this comment.
🛑 Race Condition: partialTranscriptionText is updated without synchronization while the same property may be cleared by resetToIdle(), endRecording(), or cancelEouMonitoring() on different execution contexts. Since StateManager is @MainActor, verify all property mutations occur on MainActor. The continue statement after setting partial text skips the MainActor check that might exist in subsequent code, potentially causing cross-actor access violations.
| self.partialTranscriptionText = nil | ||
| // ... rest of EOU handling unchanged ... |
There was a problem hiding this comment.
Missing nil-safety check after break. If the loop exits due to EOU detection at line 233, finalResult is set. However, if the loop exits for other reasons (stream ends without EOU), finalResult remains nil and the subsequent "rest of EOU handling" at line 238 may attempt to unwrap or use a nil value, causing crashes. Add explicit handling for the nil case.
| self.partialTranscriptionText = nil | ||
| // ... rest of EOU handling unchanged ... | ||
| } catch { | ||
| self.partialTranscriptionText = nil | ||
| guard !Task.isCancelled else { return } | ||
| Log.stateManager.warning("EOU monitoring failed: \(error.localizedDescription)") | ||
| } |
There was a problem hiding this comment.
Inconsistent cleanup order may cause issues. Line 237 clears partialTranscriptionText before the EOU handling at line 238, but line 240 clears it again in the catch block. If the EOU handling throws, the first clear at line 237 never executes, leaving stale partial text. The catch block at line 240 should be the single source of cleanup, or use defer to guarantee cleanup regardless of the exit path.
| | Partial text updates too frequently, causing UI jank | `setPartialCallback` fires per-chunk (160ms). At ~6 updates/sec this is fine for SwiftUI text updates. If needed, throttle with a `Date` comparison. | | ||
| | Overlay height changes cause visual jitter when partial text appears/disappears | Use `.animation(.easeInOut(duration: 0.2))` on the text transition. Consider a fixed reserved height for the text area when in partial-results mode. | | ||
| | User enables setting but switches to a non-EOU model | Toggle shows disabled state with explanatory caption. Setting value persists but has no effect — the `supportsPartialResults()` check in StateManager gates the behavior. | | ||
| | `setPartialCallback` not cleared between sessions | Call `manager.reset()` (already done) which should clear internal state. Verify in FluidAudio source that reset clears the callback, or explicitly set a nil callback on cleanup. | |
There was a problem hiding this comment.
The mitigation doesn't fully address the risk. Line 393 states "Verify in FluidAudio source that reset clears the callback, or explicitly set a nil callback on cleanup," but the implementation at line 163 only calls task.cancel() without clearing the callback. If manager.reset() doesn't clear the callback internally, the callback will continue firing across sessions, yielding to closed continuations and causing crashes. The plan must mandate explicit callback cleanup, not leave it as optional verification.
There was a problem hiding this comment.
Pull request overview
Adds an implementation plan for supporting real-time “ghost text” partial transcription during recording (intended for Parakeet EOU models), documenting proposed model capability flags, settings, engine protocol changes, and UI wiring.
Changes:
- Introduces a detailed design/implementation plan for partial transcription streaming and overlay display
- Outlines proposed protocol/model changes (
supportsPartialResults,isPartial,emitPartialResults) and UI/settings integration - Lists phased execution steps, risks, and test ideas for the planned feature
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## Goal | ||
|
|
||
| Add real-time "ghost text" during recording: as the user speaks, partial transcription text appears in the recording overlay, giving immediate visual feedback before the final transcription is produced. This is opt-in (off by default), only available for models that support it, and builds on top of the hands-free / EOU infrastructure from PR #22. | ||
|
|
| Add real-time "ghost text" during recording: as the user speaks, partial transcription text appears in the recording overlay, giving immediate visual feedback before the final transcription is produced. This is opt-in (off by default), only available for models that support it, and builds on top of the hands-free / EOU infrastructure from PR #22. | ||
|
|
||
| ## Background — What the SDK Supports | ||
|
|
||
| ### Parakeet EOU 120M (`StreamingEouAsrManager`) | ||
| - **`setPartialCallback(_ callback: @escaping PartialCallback)`** — fires after every audio chunk with the accumulated partial transcript as a `String`. This is the lowest-effort integration: one line to register, callback fires automatically during `process(audioBuffer:)`. | ||
| - Already used for streaming + EOU detection. Adding partial results is additive. |
| // TranscriptionResult.swift — add isPartial flag | ||
| struct TranscriptionResult: Sendable, Equatable { | ||
| let text: String | ||
| let detectedLanguage: String? | ||
| let duration: TimeInterval | ||
| let isEndOfUtterance: Bool // from PR #22 | ||
| let isPartial: Bool // NEW | ||
|
|
||
| init(text: String, detectedLanguage: String? = nil, duration: TimeInterval, | ||
| isEndOfUtterance: Bool = false, isPartial: Bool = false) { ... } | ||
| } |
| The `emitPartialResults` parameter is threaded through `transcribeStream()`. The protocol method signature stays the same — the engine reads the setting internally or we add it as a parameter. | ||
|
|
||
| **Decision:** Add an optional `emitPartialResults: Bool = false` parameter to `transcribeStream` on the protocol, defaulting to `false` so all existing call sites are unaffected. | ||
|
|
||
| ```swift | ||
| // TranscriptionEngine.swift — updated signature | ||
| func transcribeStream( | ||
| _ audioStream: AsyncStream<[Float]>, | ||
| language: TranscriptionLanguage, | ||
| emitPartialResults: Bool | ||
| ) async -> AsyncThrowingStream<TranscriptionResult, Error> | ||
| ``` | ||
|
|
||
| With a default extension: | ||
| ```swift | ||
| extension TranscriptionEngine { | ||
| func transcribeStream( | ||
| _ audioStream: AsyncStream<[Float]>, | ||
| language: TranscriptionLanguage | ||
| ) async -> AsyncThrowingStream<TranscriptionResult, Error> { | ||
| await transcribeStream(audioStream, language: language, emitPartialResults: false) | ||
| } | ||
| } |
Add support for real time text insertion streaming for models that support it.
Recreated from #27 after accidental branch deletion.