Skip to content

Feat. Text Streaming#50

Open
sebsto wants to merge 1 commit into
mainfrom
sebsto/text_streaming
Open

Feat. Text Streaming#50
sebsto wants to merge 1 commit into
mainfrom
sebsto/text_streaming

Conversation

@sebsto
Copy link
Copy Markdown
Owner

@sebsto sebsto commented Apr 10, 2026

Add support for real time text insertion streaming for models that support it.

Recreated from #27 after accidental branch deletion.

Copilot AI review requested due to automatic review settings April 10, 2026 08:11
Copy link
Copy Markdown
Contributor

@amazon-q-developer amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This implementation plan for real-time partial transcription streaming is well-structured but contains 7 critical defects that would cause crashes, race conditions, and resource leaks if implemented as written.

Critical Issues Found:

  1. Concurrency safety violation in partial callback with startTime capture
  2. Resource leak - callback not explicitly cleaned up on termination
  3. Infinite recursion in default protocol extension design
  4. Race condition in partialTranscriptionText updates across actors
  5. Missing nil-safety for finalResult after loop exit
  6. Inconsistent cleanup - partial text cleared in multiple locations without guarantees
  7. Incomplete mitigation - callback cleanup left as "verify" rather than required

All identified issues are blocking defects that must be fixed before implementation. The design is sound but the pseudocode examples contain logic errors that would prevent correct functionality.

Recommendation:

Address all 7 comments before proceeding with implementation. Focus particularly on the concurrency model (issues #1, #4) and resource cleanup (issues #2, #6, #7) as these will cause the most severe runtime failures.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

Comment on lines +124 to +133
await manager.setPartialCallback { partialText in
let trimmed = partialText.trimmingCharacters(in: .whitespacesAndNewlines)
guard !trimmed.isEmpty else { return }
continuation.yield(TranscriptionResult(
text: trimmed,
detectedLanguage: nil,
duration: Date().timeIntervalSince(startTime),
isPartial: true
))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Concurrency Safety Issue: The partial callback captures startTime and creates Date() on potentially different threads. The callback fires on a background thread while accessing startTime from the outer scope. This creates a data race if startTime is modified between callback registration and execution. Capture startTime explicitly in the callback closure or ensure thread-safe access.

Comment on lines +163 to +164
continuation.onTermination = { _ in task.cancel() }
return stream
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Resource Leak Risk: The partial callback is registered but there's no explicit cleanup mechanism. If the task is cancelled or fails before the callback completes, the callback may continue firing and attempting to yield results to a finished continuation, causing crashes. Add explicit callback cleanup in the termination handler: continuation.onTermination = { _ in task.cancel(); await? manager.setPartialCallback(nil) }

Comment on lines +183 to +190
extension TranscriptionEngine {
func transcribeStream(
_ audioStream: AsyncStream<[Float]>,
language: TranscriptionLanguage
) async -> AsyncThrowingStream<TranscriptionResult, Error> {
await transcribeStream(audioStream, language: language, emitPartialResults: false)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Logic Error: The default extension creates infinite recursion. When calling transcribeStream(audioStream, language: language, emitPartialResults: false), if no conforming type implements the 3-parameter version, it will call itself indefinitely. The protocol should require the 3-parameter version, or the extension must call a differently-named internal method to avoid recursion.

Comment on lines +226 to +229
if result.isPartial {
// Update overlay text — this is the "ghost text"
self.partialTranscriptionText = result.text
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Race Condition: partialTranscriptionText is updated without synchronization while the same property may be cleared by resetToIdle(), endRecording(), or cancelEouMonitoring() on different execution contexts. Since StateManager is @MainActor, verify all property mutations occur on MainActor. The continue statement after setting partial text skips the MainActor check that might exist in subsequent code, potentially causing cross-actor access violations.

Comment on lines +237 to +238
self.partialTranscriptionText = nil
// ... rest of EOU handling unchanged ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing nil-safety check after break. If the loop exits due to EOU detection at line 233, finalResult is set. However, if the loop exits for other reasons (stream ends without EOU), finalResult remains nil and the subsequent "rest of EOU handling" at line 238 may attempt to unwrap or use a nil value, causing crashes. Add explicit handling for the nil case.

Comment on lines +237 to +243
self.partialTranscriptionText = nil
// ... rest of EOU handling unchanged ...
} catch {
self.partialTranscriptionText = nil
guard !Task.isCancelled else { return }
Log.stateManager.warning("EOU monitoring failed: \(error.localizedDescription)")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent cleanup order may cause issues. Line 237 clears partialTranscriptionText before the EOU handling at line 238, but line 240 clears it again in the catch block. If the EOU handling throws, the first clear at line 237 never executes, leaving stale partial text. The catch block at line 240 should be the single source of cleanup, or use defer to guarantee cleanup regardless of the exit path.

| Partial text updates too frequently, causing UI jank | `setPartialCallback` fires per-chunk (160ms). At ~6 updates/sec this is fine for SwiftUI text updates. If needed, throttle with a `Date` comparison. |
| Overlay height changes cause visual jitter when partial text appears/disappears | Use `.animation(.easeInOut(duration: 0.2))` on the text transition. Consider a fixed reserved height for the text area when in partial-results mode. |
| User enables setting but switches to a non-EOU model | Toggle shows disabled state with explanatory caption. Setting value persists but has no effect — the `supportsPartialResults()` check in StateManager gates the behavior. |
| `setPartialCallback` not cleared between sessions | Call `manager.reset()` (already done) which should clear internal state. Verify in FluidAudio source that reset clears the callback, or explicitly set a nil callback on cleanup. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mitigation doesn't fully address the risk. Line 393 states "Verify in FluidAudio source that reset clears the callback, or explicitly set a nil callback on cleanup," but the implementation at line 163 only calls task.cancel() without clearing the callback. If manager.reset() doesn't clear the callback internally, the callback will continue firing across sessions, yielding to closed continuations and causing crashes. The plan must mandate explicit callback cleanup, not leave it as optional verification.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an implementation plan for supporting real-time “ghost text” partial transcription during recording (intended for Parakeet EOU models), documenting proposed model capability flags, settings, engine protocol changes, and UI wiring.

Changes:

  • Introduces a detailed design/implementation plan for partial transcription streaming and overlay display
  • Outlines proposed protocol/model changes (supportsPartialResults, isPartial, emitPartialResults) and UI/settings integration
  • Lists phased execution steps, risks, and test ideas for the planned feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +6
## Goal

Add real-time "ghost text" during recording: as the user speaks, partial transcription text appears in the recording overlay, giving immediate visual feedback before the final transcription is produced. This is opt-in (off by default), only available for models that support it, and builds on top of the hands-free / EOU infrastructure from PR #22.

Comment on lines +5 to +11
Add real-time "ghost text" during recording: as the user speaks, partial transcription text appears in the recording overlay, giving immediate visual feedback before the final transcription is produced. This is opt-in (off by default), only available for models that support it, and builds on top of the hands-free / EOU infrastructure from PR #22.

## Background — What the SDK Supports

### Parakeet EOU 120M (`StreamingEouAsrManager`)
- **`setPartialCallback(_ callback: @escaping PartialCallback)`** — fires after every audio chunk with the accumulated partial transcript as a `String`. This is the lowest-effort integration: one line to register, callback fires automatically during `process(audioBuffer:)`.
- Already used for streaming + EOU detection. Adding partial results is additive.
Comment on lines +89 to +99
// TranscriptionResult.swift — add isPartial flag
struct TranscriptionResult: Sendable, Equatable {
let text: String
let detectedLanguage: String?
let duration: TimeInterval
let isEndOfUtterance: Bool // from PR #22
let isPartial: Bool // NEW

init(text: String, detectedLanguage: String? = nil, duration: TimeInterval,
isEndOfUtterance: Bool = false, isPartial: Bool = false) { ... }
}
Comment on lines +168 to +190
The `emitPartialResults` parameter is threaded through `transcribeStream()`. The protocol method signature stays the same — the engine reads the setting internally or we add it as a parameter.

**Decision:** Add an optional `emitPartialResults: Bool = false` parameter to `transcribeStream` on the protocol, defaulting to `false` so all existing call sites are unaffected.

```swift
// TranscriptionEngine.swift — updated signature
func transcribeStream(
_ audioStream: AsyncStream<[Float]>,
language: TranscriptionLanguage,
emitPartialResults: Bool
) async -> AsyncThrowingStream<TranscriptionResult, Error>
```

With a default extension:
```swift
extension TranscriptionEngine {
func transcribeStream(
_ audioStream: AsyncStream<[Float]>,
language: TranscriptionLanguage
) async -> AsyncThrowingStream<TranscriptionResult, Error> {
await transcribeStream(audioStream, language: language, emitPartialResults: false)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants