feat: Parakeet transcription made instant with live streaming#24
Open
artk42 wants to merge 6 commits into
Open
feat: Parakeet transcription made instant with live streaming#24artk42 wants to merge 6 commits into
artk42 wants to merge 6 commits into
Conversation
- Stream PCM chunks to daemon during recording and finalize on stop\n- Add live partial transcription pipeline for Parakeet\n- Keep push-to-talk behavior intact while integrating streaming\n- Optimize PCM preprocessing and add timing logs\n- Simplify build/version numbering and arm64 release build\n- Add/update tests for status and Parakeet audio prep/chunking
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| if let liveText = await ParakeetLiveTranscriber.shared.finalizeIfAvailable(expectedRepo: selectedRepo) { | ||
| logger.info("Parakeet live stream finalize successful") | ||
| return liveText | ||
| } |
There was a problem hiding this comment.
Empty live finalization bypasses regular transcription fallback
Medium Severity
When finalizeIfAvailable returns a non-nil but empty string (e.g., silence, very short audio, or live PCM capture that wrote no data), if let liveText succeeds and the empty string is returned immediately. This bypasses the regular transcription fallback path that processes the m4a recording via processAudioToRawPCM + transcribeWithRawPCM, which could produce valid text from the AVAudioRecorder file. An emptiness check on liveText is needed to allow fallthrough.
Additional Locations (1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


What changed
Result
Parakeet experience is instant now (after the first warm-up run).
For short and medium dictation, responsiveness is now close to real-time.
Product direction
This significantly reduces the need for multiple parallel STT models.
It may be better to simplify the model matrix and focus on more useful instant experiences.
Already exploring next improvements in that direction.
Note
Medium Risk
Touches the core recording/transcription path by adding a new live audio capture pipeline, new ML daemon RPC methods, and new cleanup/polling logic; regressions could impact recording stability, CPU usage, and transcription correctness.
Overview
Parakeet now supports true live transcription while recording.
AudioRecordertaps the mic viaAVAudioEngine, converts to 16kHz float32 PCM, streams chunks to a newParakeetLiveTranscriberactor, andParakeetServicepreferentially finalizes the active stream on stop (skipping the prior “reprocess after recording” flow).ML daemon + Python layer gains streaming + better long-audio handling.
MLDaemonManageraddsparakeet_stream_*JSON-RPC methods,ml/rpc.pyroutes them, andml/parakeet.pyimplements stream sessions plus chunked transcription/overlap merging for long inputs.UI/UX + reliability tweaks.
AppStatus.recordingnow carries optional partial text; recording views show partials; partial progress notifications are ignored when not recording; press-and-hold hotkey monitoring adds a localflagsChangedmonitor to better detect modifier releases; recording window size increases.Build/versioning updates. Adds
BUILD_NUMBER+ build script changes (incrementing build number, timestamped build date, arm64-only build, improveduvbundling), updatesVersionInfoformatting, and shows build number in preferences; adds/updates tests for Parakeet PCM prep, key monitoring, and Parakeet chunking.Written by Cursor Bugbot for commit 3ee4ba1. This will update automatically on new commits. Configure here.