feature: meeting transcriber#61
Conversation
sebsto
left a comment
There was a problem hiding this comment.
Hello, @gbrunoo Thank you so much for this submission ! This will add a much demanded capability to the app.
I left a few comments
- files that should not be part of the commit
- questions about the mode operation that influence the onboarding
- IMHO, the onboarding flow should not be touched by this change
Also, there is no test for the new audio engine and state manager. Can you add some to verify the logic / state of these two new components ?
Please answer on the comments. I will pull these changes in and compile / test the next version.
There was a problem hiding this comment.
This file should not be committed to the project.
|
|
||
| /// Called when the user closes the window via the red X button. | ||
| /// Syncs `isWindowVisible` back to false so the menu item can reopen it. | ||
| nonisolated func windowWillClose(_ notification: Notification) { |
There was a problem hiding this comment.
The default SwiftUI views are isolated to MainActor. Why declare it nonisolated and the run the code in a MainActor isolated block ? I may miss something, otherwise this can be simplified.
There was a problem hiding this comment.
I don't understand why adding / heading steps in the onboarding flows.
I think I understand you changed the app to work in a different mode for meeting transcription. Either teh app install itself as a meeting transcription either as dictation. If this is the case, this must be changed :-) The app should stay polyvalent and work with two different activations mode : transcript or dictation and user must be able to use one or the other easily (typically, two different actions) - not a persistent mode.
IMHO, the onboarding flow must not be modified to support this new capability
There was a problem hiding this comment.
The meeting transcriber requires additional rights as the app can record the system sounds. If not included on the onboarding the user will have to manually add the rights on the first attempt to use the meeting transcriber
|
|
||
| /// When true, the app is running as the meeting transcription variant. | ||
| /// Disables features that require accessibility permission (text insertion at cursor). | ||
| var isMeetingMode: Bool = false |
There was a problem hiding this comment.
As described in my previous comment. OK to work in meeting mode temporaly. If this is needed we can leave it as-is. But onboarding can not be touched by this
There was a problem hiding this comment.
This file should not be part of the commit
- Remove isMeetingMode from SettingsStore - Revert onboarding flow to not skip accessibility permission - Revert app name to 'Wispr' (from 'Wispr Steno') - Revert bundle identifier to com.stormacq.mac.wispr - Revert development team to original - Meeting transcription is now a separate menu action users can trigger on-demand - Users can switch between dictation and meeting transcription dynamically This addresses maintainer feedback: the app should be polyvalent, allowing users to easily use both dictation and meeting transcription without choosing a persistent mode during onboarding.
|
Hello @gbrunoo Thank you fo rthe new commits. Now, there 128 files changed in the project. |
sebsto
left a comment
There was a problem hiding this comment.
Thank you for having reverted the multi mode aspects. Looks liek your force push changed the permission on all projects files. This PR now touches 128 files, most of them with a permission change. Can you revert this to only send the files that have actually be changed ?
also, the task/toto.md must be renamed and placed into .kiro/specs directory.
Thank you
- Fix file permissions (100755 → 100644) on all files - Move tasks/todo.md to .kiro/specs/meeting-transcription/design.md - Revert files that should not be in PR: project.pbxproj, OnboardingFlow.swift, .vscode/settings.json, ModelPaths.swift, SettingsStore.swift - Fix MenuBarControllerTests for new meetingStateManager parameter - Add MeetingAudioEngineTests (8 tests): stream behavior, safe no-ops, capture failure handling, double-start guard - Add MeetingStateManagerTests (14 tests): transcript model formatting, state lifecycle, error handling, copy transcript, double-start prevention
Address PR feedback: fix permissions, add tests, clean up
Add NSWindowDelegate conformance to MeetingWindowPanel so that windowWillClose syncs isVisible back to false. Without this, the guard in show() would early-return because isVisible was stale.
Fix: meeting window can be reopened after closing via red X button
|
Thank you @gbrunoo This is much easier to read. Use the Kiro power skill to direct AI Coding agent towards the correct set of rules |
sebsto
left a comment
There was a problem hiding this comment.
Much better now - still a couple of lines that are not compliant with swift 6 structred concurrency rules
| // MARK: - ScreenCaptureKit Audio Output Handler | ||
|
|
||
| /// Receives audio sample buffers from SCStream and converts them to Float32 arrays. | ||
| final class SystemAudioOutputHandler: NSObject, SCStreamOutput, @unchecked Sendable { |
There was a problem hiding this comment.
final class SystemAudioOutputHandler: NSObject, SCStreamOutput, @unchecked Sendable {
private let onSamples: @Sendable ([Float]) -> Void
The class is final, its sole stored property is an immutable let of @Sendable closure type, and NSObject is already @unchecked Sendable in the SDK. The compiler can verify Sendable conformance directly — @unchecked is not needed and bypasses that verification.
Severity: Low
Fix: Replace @unchecked Sendable with plain Sendable.
| /// entries until system audio support is added. | ||
| @MainActor | ||
| @Observable | ||
| final class MeetingStateManager { |
There was a problem hiding this comment.
MeetingStateManager is @MainActor. Tasks spawned from its methods inherit MainActor isolation. The await MainActor.run { } wrappers inside those tasks are redundant hops that add overhead and mislead readers into thinking the code is off the main actor.
| Location | Code |
|---|---|
startMicTranscription() |
await MainActor.run { self.transcript.entries.append(...) } |
startSystemTranscription() |
await MainActor.run { self.transcript.entries.append(...) } |
startMicLevelConsumption() |
await MainActor.run { self?.micLevel = level } |
startSystemLevelConsumption() |
await MainActor.run { self?.systemLevel = level } |
startTimer() |
await MainActor.run { self?.elapsedTime = ... } |
Severity: Low
Fix: Remove all five await MainActor.run { } wrappers; access properties directly.
| /// entries until system audio support is added. | ||
| @MainActor | ||
| @Observable | ||
| final class MeetingStateManager { |
There was a problem hiding this comment.
The manager spawns five unstructured Task { } stored in properties and manually cancelled in stopMeeting():
| Property | Spawned in |
|---|---|
micTranscriptionTask |
startMicTranscription() |
systemTranscriptionTask |
startSystemTranscription() |
micLevelTask |
startMicLevelConsumption(_:) |
systemLevelTask |
startSystemLevelConsumption(_:) |
timerTask |
startTimer() |
If stopMeeting() is never called (e.g. the manager is deallocated while recording), the tasks leak and keep running. There is no deinit safety net.
Severity: Medium
Fix: Replace the five independent tasks with a single withTaskGroup launched from startMeeting(). The group naturally cancels all child tasks when the parent scope exits. Alternatively, store a single parent Task that spawns child tasks via async let or a task group, so cancelling the parent cascades to all children.
| /// | ||
| /// If system audio capture fails (e.g. permission denied, sandbox restriction), | ||
| /// the engine continues with mic-only capture and logs a warning. | ||
| actor MeetingAudioEngine { |
There was a problem hiding this comment.
// In installTap closure
Task {
await self.processMicSamples(samples)
}
// In SystemAudioOutputHandler callback
Task {
await self.processSystemSamples(samples)
}Each audio buffer spawns a new unstructured Task to hop into the actor's isolation domain. Under heavy audio load this creates many short-lived tasks with no backpressure.
Severity: Low-Medium
Fix: Use an AsyncStream as the bridge instead — the tap yields samples into a continuation, and a single structured task consumes them inside the actor. This gives natural backpressure and one task instead of hundreds.
| permissionMonitoringTask?.cancel() | ||
| updateCheckTask?.cancel() | ||
|
|
||
| // Stop any active meeting session |
There was a problem hiding this comment.
if let msm = meetingStateManager {
Task { await msm.stopMeeting() }
}This spawns an unstructured task during app termination. The process may exit before the task completes, meaning stopMeeting() (which flushes buffers, cancels child tasks, stops audio capture) may never finish.
Severity: Medium
Fix: await the stop synchronously (acceptable at termination)
MeetingAudioEngine: - Replace per-buffer Task spawning with AsyncStream bridge pattern. Tap callbacks yield into continuations; single consumer tasks on the actor read from the streams. Eliminates hundreds of short-lived tasks under heavy audio load and provides natural backpressure. - Replace @unchecked Sendable with plain Sendable on SystemAudioOutputHandler (final class with immutable let property). MeetingStateManager: - Remove 5 redundant await MainActor.run {} wrappers. Since the class is @mainactor, spawned Tasks inherit isolation — direct property access is correct. - Replace 5 unstructured Task properties with single recordingTask using withTaskGroup. Cancelling the parent cascades to all children. Add cancelRecording() for synchronous cancellation at app termination. wisprApp: - Replace fire-and-forget Task { await msm.stopMeeting() } in applicationWillTerminate with synchronous cancelRecording() call.
Refactor for Swift 6 structured concurrency compliance
|
Hi @sebsto, thank you for the thorough review! I've addressed all the feedback: Swift 6 Structured Concurrency fixes: Ready for re-review when you have a chance! |
adding the possibility to record meetings by recording from own microphone and from the computer audio stream. the two transcript stay separate and can then subsequently copied elsewhere