Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions .idea/clicky.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in th
- **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy
- **Screen Capture**: ScreenCaptureKit (macOS 14.2+), multi-monitor support
- **Voice Input**: Push-to-talk via `AVAudioEngine` + pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap.
- **Text Input**: Customizable global hotkey (`⌥⌘K` by default) opens a compact typed-command popup near the cursor. Typed commands reuse the same screenshot → Claude → TTS/pointing pipeline as voice.
- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target.
- **Concurrency**: `@MainActor` isolation, async/await throughout
- **Analytics**: PostHog via `ClickyAnalytics.swift`
Expand All @@ -44,6 +45,8 @@ Worker vars: `ELEVENLABS_VOICE_ID`

**Global Push-To-Talk Shortcut**: Background push-to-talk uses a listen-only `CGEvent` tap instead of an AppKit global monitor so modifier-based shortcuts like `ctrl + option` are detected more reliably while the app is running in the background.

**Global Text Input Shortcut**: Background typed input uses a sibling listen-only `CGEvent` tap with a persisted shortcut model. The popup is a lightweight `NSPanel` near the cursor, and submitted text feeds the same core response pipeline as voice transcripts.

**Shared URLSession for AssemblyAI**: A single long-lived `URLSession` is shared across all AssemblyAI streaming sessions (owned by the provider, not the session). Creating and invalidating a URLSession per session corrupts the OS connection pool and causes "Socket is not connected" errors after a few rapid reconnections.

**Transient Cursor Mode**: When "Show Clicky" is off, pressing the hotkey fades in the cursor overlay for the duration of the interaction (recording → response → TTS → optional pointing), then fades it out automatically after 1 second of inactivity.
Expand All @@ -53,9 +56,9 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| File | Lines | Purpose |
|------|-------|---------|
| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. |
| `CompanionManager.swift` | ~1026 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `CompanionManager.swift` | ~1111 | Central state machine. Owns dictation, voice/text shortcut monitoring, text popup orchestration, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates voice and typed command → screenshot → Claude → TTS → pointing pipeline. |
| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. |
| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `CompanionPanelView.swift` | ~876 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), text hotkey settings, permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. |
| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. |
| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. |
Expand All @@ -66,6 +69,9 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| `AppleSpeechTranscriptionProvider.swift` | ~147 | Local fallback transcription provider backed by Apple's Speech framework. |
| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. |
| `GlobalPushToTalkShortcutMonitor.swift` | ~132 | System-wide push-to-talk monitor. Owns the listen-only `CGEvent` tap and publishes press/release transitions. |
| `ClickyKeyboardShortcut.swift` | ~178 | Persisted keyboard shortcut model for typed command hotkeys, including display text and validation. |
| `GlobalTextInputShortcutMonitor.swift` | ~134 | System-wide typed command shortcut monitor. Owns a listen-only `CGEvent` tap and publishes trigger events for the popup. |
| `TextCommandPopupManager.swift` | ~342 | Compact typed-command `NSPanel` manager and SwiftUI popup view. Tracks near the cursor, focuses input, submits on Enter, and closes on Esc/outside click. |
| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. |
| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. |
| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. |
Expand Down
17 changes: 15 additions & 2 deletions leanring-buddy.xcodeproj/project.pbxproj
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,22 @@
28F22CD62F56440300A0FC59 /* leanring-buddyUITests.xctest */ = {isa = PBXFileReference; explicitFileType = wrapper.cfbundle; includeInIndex = 0; path = "leanring-buddyUITests.xctest"; sourceTree = BUILT_PRODUCTS_DIR; };
/* End PBXFileReference section */

/* Begin PBXFileSystemSynchronizedBuildFileExceptionSet section */
AA00BB072F6500070039DA55 /* Exceptions for "leanring-buddy" folder in "leanring-buddy" target */ = {
isa = PBXFileSystemSynchronizedBuildFileExceptionSet;
membershipExceptions = (
Info.plist,
);
target = 28F22CBE2F56440300A0FC59 /* leanring-buddy */;
};
/* End PBXFileSystemSynchronizedBuildFileExceptionSet section */

/* Begin PBXFileSystemSynchronizedRootGroup section */
28F22CC12F56440300A0FC59 /* leanring-buddy */ = {
isa = PBXFileSystemSynchronizedRootGroup;
exceptions = (
AA00BB072F6500070039DA55 /* Exceptions for "leanring-buddy" folder in "leanring-buddy" target */,
);
path = "leanring-buddy";
sourceTree = "<group>";
};
Expand Down Expand Up @@ -411,7 +424,7 @@
CODE_SIGN_STYLE = Automatic;
COMBINE_HIDPI_IMAGES = YES;
CURRENT_PROJECT_VERSION = 1;
DEVELOPMENT_TEAM = 2UDAY4J48G;
DEVELOPMENT_TEAM = NDJZK3L926;
ENABLE_APP_SANDBOX = NO;
ENABLE_HARDENED_RUNTIME = YES;
ENABLE_OUTGOING_NETWORK_CONNECTIONS = YES;
Expand Down Expand Up @@ -449,7 +462,7 @@
CODE_SIGN_STYLE = Automatic;
COMBINE_HIDPI_IMAGES = YES;
CURRENT_PROJECT_VERSION = 1;
DEVELOPMENT_TEAM = 2UDAY4J48G;
DEVELOPMENT_TEAM = NDJZK3L926;
ENABLE_APP_SANDBOX = NO;
ENABLE_HARDENED_RUNTIME = YES;
ENABLE_OUTGOING_NETWORK_CONNECTIONS = YES;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>SchemeUserState</key>
<dict>
<key>leanring-buddy.xcscheme_^#shared#^_</key>
<dict>
<key>orderHint</key>
<integer>0</integer>
</dict>
</dict>
</dict>
</plist>
180 changes: 180 additions & 0 deletions leanring-buddy/ClickyKeyboardShortcut.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
//
// ClickyKeyboardShortcut.swift
// leanring-buddy
//
// Persisted keyboard shortcut model for Clicky's typed command popup.
//

import AppKit
import Foundation

struct ClickyKeyboardShortcut: Codable, Equatable {
static let textInputUserDefaultsKey = "textInputKeyboardShortcut"
static let defaultTextInputShortcut = ClickyKeyboardShortcut(
keyCode: 40,
modifierFlags: [.option, .command],
keyDisplayText: "K"
)
private static let supportedModifierFlags: NSEvent.ModifierFlags = [
.control,
.option,
.shift,
.command,
.function
]

let keyCode: UInt16
let modifierFlagsRawValue: UInt
let keyDisplayText: String

var modifierFlags: NSEvent.ModifierFlags {
NSEvent.ModifierFlags(rawValue: modifierFlagsRawValue)
.intersection(.deviceIndependentFlagsMask)
.intersection(Self.supportedModifierFlags)
}

var displayText: String {
let modifierDisplayText = Self.displayText(for: modifierFlags)
guard !modifierDisplayText.isEmpty else { return keyDisplayText }
return modifierDisplayText + keyDisplayText
}

var validationErrorMessage: String? {
if keyDisplayText.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
return "Choose a key with at least one modifier."
}

if modifierFlags.isEmpty {
return "Add at least one modifier."
}

if modifierFlags.contains(.control) && modifierFlags.contains(.option) {
return "Control+Option is reserved for voice."
}

return nil
}

init(keyCode: UInt16, modifierFlags: NSEvent.ModifierFlags, keyDisplayText: String) {
self.keyCode = keyCode
self.modifierFlagsRawValue = modifierFlags
.intersection(.deviceIndependentFlagsMask)
.intersection(Self.supportedModifierFlags)
.rawValue
self.keyDisplayText = keyDisplayText
}

static func persistedTextInputShortcut() -> ClickyKeyboardShortcut {
guard let data = UserDefaults.standard.data(forKey: textInputUserDefaultsKey),
let shortcut = try? JSONDecoder().decode(ClickyKeyboardShortcut.self, from: data),
shortcut.validationErrorMessage == nil else {
return defaultTextInputShortcut
}

return shortcut
}

func persistAsTextInputShortcut() {
guard let data = try? JSONEncoder().encode(self) else { return }
UserDefaults.standard.set(data, forKey: Self.textInputUserDefaultsKey)
}

func matches(keyCode eventKeyCode: UInt16, modifierFlags eventModifierFlags: NSEvent.ModifierFlags) -> Bool {
eventKeyCode == keyCode
&& eventModifierFlags
.intersection(.deviceIndependentFlagsMask)
.isSuperset(of: modifierFlags)
}

static func shortcut(from event: NSEvent) -> ClickyKeyboardShortcut? {
guard event.type == .keyDown else { return nil }

let keyDisplayText = keyDisplayText(for: event)
guard !keyDisplayText.isEmpty else { return nil }

return ClickyKeyboardShortcut(
keyCode: event.keyCode,
modifierFlags: event.modifierFlags.intersection(.deviceIndependentFlagsMask),
keyDisplayText: keyDisplayText
)
}

static func shortcut(
keyCode: UInt16,
modifierFlagsRawValue: UInt64
) -> ClickyKeyboardShortcut? {
guard let keyDisplayText = keyDisplayText(forKeyCode: keyCode),
!keyDisplayText.isEmpty else {
return nil
}

return ClickyKeyboardShortcut(
keyCode: keyCode,
modifierFlags: NSEvent.ModifierFlags(rawValue: UInt(modifierFlagsRawValue))
.intersection(.deviceIndependentFlagsMask),
keyDisplayText: keyDisplayText
)
}

private static func displayText(for modifierFlags: NSEvent.ModifierFlags) -> String {
var displayText = ""

if modifierFlags.contains(.control) {
displayText += "⌃"
}
if modifierFlags.contains(.option) {
displayText += "⌥"
}
if modifierFlags.contains(.shift) {
displayText += "⇧"
}
if modifierFlags.contains(.command) {
displayText += "⌘"
}
if modifierFlags.contains(.function) {
displayText += "fn "
}

return displayText
}

private static func keyDisplayText(for event: NSEvent) -> String {
if let specialKey = keyDisplayText(forKeyCode: event.keyCode), !specialKey.isEmpty {
return specialKey
}

if let charactersIgnoringModifiers = event.charactersIgnoringModifiers,
let firstCharacter = charactersIgnoringModifiers.first {
return String(firstCharacter).uppercased()
}

return ""
}

private static func keyDisplayText(forKeyCode keyCode: UInt16) -> String? {
switch keyCode {
case 36:
return "Return"
case 48:
return "Tab"
case 49:
return "Space"
case 51:
return "Delete"
case 53:
return "Esc"
case 76:
return "Enter"
case 123:
return "←"
case 124:
return "→"
case 125:
return "↓"
case 126:
return "↑"
default:
return nil
}
}
}
Loading