Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .kiro/specs/meeting-transcription/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Meeting Transcription Mode — Implementation Plan

## Overview
Add a new "Meeting Mode" to Wispr that:
1. Shows a floating square window with recording controls and live transcript
2. Captures both system audio (what others say in meetings) and microphone audio (what you say)
3. Separates speakers into "You" vs "Others" based on audio source
4. Displays a scrolling live transcript in the window
5. Allows copying/exporting the transcript for notes

## Architecture Decisions

### System Audio Capture
macOS requires `ScreenCaptureKit` (macOS 13+) to capture system audio. This is the only sanctioned API — `AVAudioEngine` can only capture microphone input. We'll use `SCStreamConfiguration` with `capturesAudio = true` and `excludesCurrentProcessAudio = true`.

This requires the **Screen Recording** permission (user grants in System Settings > Privacy & Security > Screen Recording).

### Speaker Separation Strategy
Instead of ML-based diarization (complex, heavy), we use a simple but effective approach:
- **Microphone audio** → labeled as "You"
- **System audio** → labeled as "Others"

This works perfectly for meetings because system audio = remote participants, mic = you.

### Dual Audio Engine
Create a new `MeetingAudioEngine` actor that runs two capture pipelines in parallel:
1. `AVAudioEngine` for microphone (existing approach)
2. `SCStreamConfiguration` for system audio

Both streams are resampled to 16kHz mono Float32 and fed to separate transcription instances.

### Transcription Approach
Run two parallel transcription sessions:
- One for mic audio chunks → "You:" prefix
- One for system audio chunks → "Others:" prefix

Use chunked transcription (process every ~5-10 seconds of audio) for near-real-time results.

## Implementation Tasks

### Phase 1: Core Infrastructure
- [x] 1.1 Create `MeetingTranscript` model (timestamped entries with speaker labels)
- [x] 1.2 Create `MeetingAudioEngine` actor (dual capture: mic + system audio via ScreenCaptureKit)
- [x] 1.3 Create `MeetingStateManager` (orchestrates meeting mode state machine)
- [x] 1.4 Add Screen Recording permission handling to `PermissionManager`

### Phase 2: Meeting Mode UI
- [x] 2.1 Create `MeetingTranscriptView` (scrolling transcript with speaker labels)
- [x] 2.2 Create `MeetingWindowPanel` (floating square NSPanel with controls)
- [x] 2.3 Add "Meeting Mode" menu item to `MenuBarController`
- [x] 2.4 Wire meeting window visibility to `MeetingStateManager`

### Phase 3: Integration
- [x] 3.1 Add meeting mode settings to `SettingsStore` (not needed for MVP — uses existing language settings)
- [x] 3.2 Wire up `WisprAppDelegate` to bootstrap meeting mode services
- [x] 3.3 Add transcript export (copy to clipboard / save as text file)

## File Plan
```
wispr/Models/MeetingTranscript.swift — transcript data model
wispr/Services/MeetingAudioEngine.swift — dual audio capture
wispr/Services/MeetingStateManager.swift — meeting mode coordinator
wispr/UI/Meeting/MeetingTranscriptView.swift — transcript UI
wispr/UI/Meeting/MeetingWindowPanel.swift — floating window
```
72 changes: 72 additions & 0 deletions Sources/WisprApp/Models/MeetingTranscript.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
//
// MeetingTranscript.swift
// wispr
//
// Data model for meeting transcription entries with speaker labels.
//

import Foundation

/// Identifies the audio source / speaker in a meeting transcript.
enum MeetingSpeaker: String, Sendable, Equatable, Hashable {
case you = "You"
case others = "Others"
}

/// A single timestamped entry in a meeting transcript.
struct MeetingTranscriptEntry: Identifiable, Sendable, Equatable {
let id: UUID
let speaker: MeetingSpeaker
let text: String
let timestamp: Date

init(speaker: MeetingSpeaker, text: String, timestamp: Date = Date()) {
self.id = UUID()
self.speaker = speaker
self.text = text
self.timestamp = timestamp
}
}

/// The full transcript of a meeting session.
struct MeetingTranscript: Sendable, Equatable {
var entries: [MeetingTranscriptEntry] = []
let startTime: Date

init(startTime: Date = Date()) {
self.startTime = startTime
}

/// Shared time formatter for transcript display (HH:mm:ss).
private static let timeFormatter: DateFormatter = {
let formatter = DateFormatter()
formatter.dateFormat = "HH:mm:ss"
return formatter
}()

/// Formats a date as HH:mm:ss for transcript display.
static func formatTime(_ date: Date) -> String {
timeFormatter.string(from: date)
}

/// Formats the entire transcript as plain text for export.
func asPlainText() -> String {
entries.map { entry in
let time = Self.formatTime(entry.timestamp)
return "[\(time)] \(entry.speaker.rawValue): \(entry.text)"
}.joined(separator: "\n")
}

/// Duration of the meeting so far.
var duration: TimeInterval {
Date().timeIntervalSince(startTime)
}

/// Formatted duration string (e.g. "12:34").
var formattedDuration: String {
let total = Int(duration)
let minutes = total / 60
let seconds = total % 60
return String(format: "%d:%02d", minutes, seconds)
}
}
27 changes: 27 additions & 0 deletions Sources/WisprApp/Models/TranscriptDocument.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
//
// TranscriptDocument.swift
// wispr
//
// FileDocument type for exporting meeting transcripts via SwiftUI .fileExporter().
//

import SwiftUI
import UniformTypeIdentifiers

struct TranscriptDocument: FileDocument {
static let readableContentTypes: [UTType] = [.plainText]
let text: String

init(text: String) {
self.text = text
}

init(configuration: ReadConfiguration) throws {
let data = configuration.file.regularFileContents ?? Data()
text = String(decoding: data, as: UTF8.self)
}

func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper {
FileWrapper(regularFileWithContents: Data(text.utf8))
}
}
Loading
Loading