Inspired by superwhisper, definitely check it out if you're looking for something stable, polished and feature rich.
Warning
This code is a mess. It's a PoC where I'm constantly experimenting, changing things, and publishing everything as I go to test the app in action.
You've been warned. Enter at your own risk.
- simple audio input
- settings
- api keys
- format with AI toggle
- formatting preset selection (default, message, note, email)
- model selection
- persist settings
- language selection
- auto update
- tray icon menu
- quit
- settings
- history (simple, stored in file system: audio + transcript + llm response)
- REFACTOR
- deep link support
- retry mechanism
- tray menu action to open history file dir
- paste text after processing
- add config to disable pasting
- maybe this ai toggle is not needed now, one of the presets could be "no formatting"
- prevent sending empty messages (no audio)
- cancel recording
- add configs/presets/modes (need to decide on the name)
- each config can have: formatting preset, ai model, language
- choose one default config
- assign shortcut to each config
- support local models
- webhooks - configure webhook and send transcript to it after processing
The application follows a modular architecture with a clear separation between features and services:
- Features: Contain business logic and UI components for specific application features
- Services: Handle API interactions (Tauri, OpenAI, Grok, etc.) and provide a clean interface for features
src/
├── app.html
├── app.d.ts
├── routes/
│ ├── +layout.ts
│ ├── app/ # Main app window
│ │ └── +layout.svelte
│ └── settings/ # Settings window
├── lib/
│ ├── core/ # Core functionalities
│ │ ├── types.ts
│ │ ├── constants.ts
│ │ ├── settings.ts
│ │ └── store.ts
│ ├── services/ # Service layer for API interactions
│ │ ├── file-system.ts
│ │ ├── windows.ts
│ │ ├── clipboard.ts
│ │ ├── play-sound.ts
│ │ ├── transcription/
│ │ │ ├── index.ts
│ │ │ └── providers.ts # Transcription providers (Grok, ...)
│ │ └── ai/
│ │ ├── index.ts
│ │ └── providers.ts # AI providers (OpenAI, Anthropic, etc.)
│ ├── features/
│ │ ├── audio/ # Audio recording and visualization
│ │ │ ├── recorder.svelte.ts
│ │ │ └── visualizer.svelte
│ │ ├── ai-formatting/ # AI text formatting
│ │ │ ├── formatting.ts
│ │ │ ├── presets.ts
│ │ │ └── prompts/
│ │ ├── app-updates.ts # Automatic application updates
│ │ └── system-tray.ts # System tray
│ ├── assets/
│ │ ├── sounds/
│ │ └── icons/
│ ├── global.css
│ └── ui/ # (Future) Reusable UI components
│ └── components/ # (Future) UI component library
- Services Layer: Introduced to handle API interactions and provide a clean interface for features
- API Key Management: API keys are stored in the central store and accessed directly by services for simplicity
- Feature-Service Separation: Features contain business logic while services handle external interactions
- No Circular Dependencies: Services are designed to avoid importing from one another
I wanted my app to work like superwhisper - when you stop recording, the text should automatically appear where your cursor is. The app should copy the text to your clipboard AND paste it for you right away.
The problem is that Tauri (what I'm using to build this app) can't control your cursor to paste text.
My solution:
- Use Keyboard Maestro as a helper
- Set up Keyboard Maestro's web server
- Create a macro with a web trigger that pastes clipboard content
- Make my app send a request to this local web server when needed
To set this up yourself:
- Install Keyboard Maestro
- Turn on its web server feature
- Create a macro with a public web trigger (helpful video: https://www.youtube.com/watch?v=D0IqJt-H9xE)
- Connect from Tauri using the HTTP client plugin (docs: https://tauri.app/plugin/http-client/#usage)
Here's what my macro looks like:
I'm experimenting with Leader Key for combo keyboard shortcuts.
E.g. with my current setup I press my leader key
then j
to trigger justsayit
layer and then:
l
to set languagep
Polishe
English
a
to set AI formattingy
onn
off
p
to set formatting presetd
defaultm
messagen
notee
email