Skip to content

feat: Parakeet transcription made instant with live streaming#24

Open
artk42 wants to merge 6 commits into
mazdak:masterfrom
artk42:parakeet-speedup
Open

feat: Parakeet transcription made instant with live streaming#24
artk42 wants to merge 6 commits into
mazdak:masterfrom
artk42:parakeet-speedup

Conversation

@artk42
Copy link
Copy Markdown

@artk42 artk42 commented Feb 24, 2026

What changed

  • Added true Parakeet live streaming pipeline during recording.
  • Finalization now happens on stop without full reprocessing from scratch.
  • Improved partial transcript UI behavior and stability.

Result

Parakeet experience is instant now (after the first warm-up run).
For short and medium dictation, responsiveness is now close to real-time.

Product direction

This significantly reduces the need for multiple parallel STT models.
It may be better to simplify the model matrix and focus on more useful instant experiences.
Already exploring next improvements in that direction.


Note

Medium Risk
Touches the core recording/transcription path by adding a new live audio capture pipeline, new ML daemon RPC methods, and new cleanup/polling logic; regressions could impact recording stability, CPU usage, and transcription correctness.

Overview
Parakeet now supports true live transcription while recording. AudioRecorder taps the mic via AVAudioEngine, converts to 16kHz float32 PCM, streams chunks to a new ParakeetLiveTranscriber actor, and ParakeetService preferentially finalizes the active stream on stop (skipping the prior “reprocess after recording” flow).

ML daemon + Python layer gains streaming + better long-audio handling. MLDaemonManager adds parakeet_stream_* JSON-RPC methods, ml/rpc.py routes them, and ml/parakeet.py implements stream sessions plus chunked transcription/overlap merging for long inputs.

UI/UX + reliability tweaks. AppStatus.recording now carries optional partial text; recording views show partials; partial progress notifications are ignored when not recording; press-and-hold hotkey monitoring adds a local flagsChanged monitor to better detect modifier releases; recording window size increases.

Build/versioning updates. Adds BUILD_NUMBER + build script changes (incrementing build number, timestamped build date, arm64-only build, improved uv bundling), updates VersionInfo formatting, and shows build number in preferences; adds/updates tests for Parakeet PCM prep, key monitoring, and Parakeet chunking.

Written by Cursor Bugbot for commit 3ee4ba1. This will update automatically on new commits. Configure here.

- Stream PCM chunks to daemon during recording and finalize on stop\n- Add live partial transcription pipeline for Parakeet\n- Keep push-to-talk behavior intact while integrating streaming\n- Optimize PCM preprocessing and add timing logs\n- Simplify build/version numbering and arm64 release build\n- Add/update tests for status and Parakeet audio prep/chunking
Comment thread Sources/Services/UvBootstrap.swift
Comment thread Sources/Views/Components/WaveformView.swift Outdated
Comment thread Sources/Managers/PressAndHoldKeyMonitor.swift Outdated
@artk42 artk42 changed the title v2.1.1: extreme speed up of Parakeet transcription with streaming feat: Parakeet transcription made instant with live streaming Feb 26, 2026
@artk42 artk42 marked this pull request as draft February 26, 2026 19:43
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if let liveText = await ParakeetLiveTranscriber.shared.finalizeIfAvailable(expectedRepo: selectedRepo) {
logger.info("Parakeet live stream finalize successful")
return liveText
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty live finalization bypasses regular transcription fallback

Medium Severity

When finalizeIfAvailable returns a non-nil but empty string (e.g., silence, very short audio, or live PCM capture that wrote no data), if let liveText succeeds and the empty string is returned immediately. This bypasses the regular transcription fallback path that processes the m4a recording via processAudioToRawPCM + transcribeWithRawPCM, which could produce valid text from the AVAudioRecorder file. An emptiness check on liveText is needed to allow fallthrough.

Additional Locations (1)

Fix in Cursor Fix in Web

@artk42 artk42 marked this pull request as ready for review February 26, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant