A privacy-first, local speech-to-text and AI cleanup tool for Windows.
It uses faster-whisper for offline transcription and can optionally send text to a local or remote OpenAI-compatible endpoint (such as LM Studio) for light cleanup or rewriting.
It supports a GUI, global hotkeys, and automatic pasting into the active window.
- 100% local transcription — no cloud calls
- Secure credential storage — API keys encrypted in Windows Credential Manager
- Optional LLM cleanup via an OpenAI-style endpoint (LM Studio, Ollama, etc.)
- Glossary injection to enforce product names, jargon, or key phrases during normalization and LLM cleanup
- Prompt editor (Edit → Prompt…) with your changes saved to
~/.whisper_dictate_prompt.txt - Per-application prompts so you can override the cleanup prompt per app or window title (Edit → Per-app prompts…)
- Glossary editor (Edit → Glossary…) with add/edit/delete controls, CSV import/export, and entries saved to
~/.whisper_dictate/whisper_dictate_glossary.json - Saves your settings (model, device, hotkey, LLM config, paste delay) to
~/.whisper_dictate/whisper_dictate_settings.json - Global hotkey for push-to-talk from any application
- Auto-paste into the focused window (
Ctrl+V), with a configurable delay - Fetch available LLM models from your endpoint directly inside the LLM settings window
- Floating status indicator that mirrors the app state (idle, listening, cleaning, etc.)
- Reset floating status indicator button if you drag the indicator off screen
- GPU or CPU execution
- One-command setup using
uv - Comprehensive test suite with coverage reporting
- Structured logging to file and console
git clone https://github.com/yourusername/whisper-dictate.git
cd whisper-dictate
uv syncuv run dictate-guiThe GUI provides:
- Model/device selection with resource requirements displayed for each model
- Auto-configured compute type based on your device (CUDA→float16, CPU→int8)
- Input-device field
- Optional LLM cleanup section (endpoint, model, API key, temperature, and system prompt)
- Auto-paste checkbox and delay setting
- Transcript view with timestamped results
Use Load model, then Register hotkey (e.g., CTRL+WIN+G), and press the hotkey anywhere to dictate.
If "Auto-paste" is enabled, the result pastes automatically into the app you were using.
- Edit → Prompt… to customize the cleanup prompt (persisted to
~/.whisper_dictate_prompt.txt). - Edit → Per-app prompts… to override the cleanup prompt for specific processes (e.g.,
winword.exe,notion.exe). Use the recent apps dropdown to prefill entries with the last windows you dictated into, and optionally add a window-title regex to scope a prompt to a particular document or channel. These rules are persisted to~/.whisper_dictate/whisper_dictate_settings.json. - Edit → Glossary… to maintain glossary entries (persisted to
~/.whisper_dictate/whisper_dictate_glossary.json). - Settings → Speech recognition… to pick your device (CPU/CUDA) and model. Models display their size and resource requirements (e.g., "Small (465 MB, ~2 GB VRAM)"), and the optimal compute type is auto-configured based on your device selection.
- Settings → Automation… to set the global hotkey, enable auto-paste, and tune the paste delay.
- Settings → LLM cleanup… to toggle cleanup, set endpoint/model/API key, refresh available models, and adjust temperature.
Use Use glossary before prompt to normalize transcripts with your glossary and prepend the rules to the LLM system prompt so it honors your terminology.
All settings are saved to
~/.whisper_dictate/whisper_dictate_settings.jsonwhen you close the app.
Per-application prompts let you tailor cleanup instructions to the active app or window:
- Open Edit → Per-app prompts….
- Pick a Recent app to prefill the process name, or add a process manually.
- Optionally set a Window title regex to target a specific document, chat, or channel.
- Enter the Prompt override for that app/window.
When dictating, Whisper Dictate detects the active process/window and applies the most specific matching prompt before sending text to the LLM cleanup step. Clear entries to fall back to the global prompt.
When the Auto-paste checkbox (GUI) is enabled:
- The final text is copied to the clipboard.
- After a short delay (default 0.15 s),
Ctrl + Vis sent to the active window.
If you toggle recording from inside Word, Notion, VS Code, or a chat window, the cleaned text appears directly where your cursor is.
Tip: Trigger the hotkey, don't click the GUI button — clicking steals focus and will paste into the GUI itself.
If you move the floating status indicator off-screen, use Settings → Reset status indicator position to snap it back to the default location.
Use the glossary to keep acronyms, brand names, or domain-specific terms intact during normalization and LLM cleanup:
- Open Edit → Glossary… and add entries as trigger/replacement pairs using the glossary manager.
- Each rule supports Match (word, phrase, regex), Case sensitive, and Whole words only to fine-tune how replacements are applied. Use Add, Edit, or Delete to maintain the list, or Import CSV / Export CSV to bulk-manage rules. An optional description can remind you why a term matters.
- Entries are saved to
~/.whisper_dictate/whisper_dictate_glossary.jsonand loaded automatically on startup. - In Settings → LLM cleanup…, enable Use glossary before prompt to apply the glossary to transcripts and prepend the rules to the LLM system prompt so it takes priority over the general cleanup prompt.
Glossary usage is optional; turn it off from Settings → LLM cleanup… if you only want the standard prompt applied.
whisper-dictate/
│
├─ whisper_dictate/
│ ├─ __init__.py # Package initialization
│ ├─ config.py # Configuration defaults and CUDA setup
│ ├─ app_context.py # Active-window context (process and title)
│ ├─ prompt.py # LLM prompt management (load/save)
│ ├─ app_prompts.py # Per-application prompt resolution helpers
│ ├─ app_prompt_dialog.py # GUI dialog for managing per-app prompt overrides
│ ├─ audio.py # Audio recording functionality
│ ├─ transcription.py # Whisper transcription logic
│ ├─ llm_cleanup.py # LLM text cleanup functionality
│ ├─ glossary.py # Glossary persistence used during LLM cleanup
│ ├─ glossary_dialog.py # GUI dialog for managing glossary rules
│ ├─ hotkeys.py # Windows global hotkey management
│ ├─ gui_components.py # Reusable GUI components
│ ├─ logging_config.py # Centralized logging setup
│ ├─ settings_store.py # Persistent settings load/save helpers
│ └─ gui.py # Main GUI application
│
├─ tests/ # Comprehensive test suite
│ ├─ test_app_context.py
│ ├─ test_app_prompts.py
│ ├─ test_config.py
│ ├─ test_prompt.py
│ ├─ test_hotkeys.py
│ ├─ test_llm_cleanup.py
│ ├─ test_glossary.py
│ ├─ test_transcription.py
│ └─ test_audio.py
│
├─ packaging/
│ └─ pyinstaller/
│ └─ whisper_dictate_gui.spec # PyInstaller build spec
│
├─ pyproject.toml
└─ README.md
# Run all tests
make test
# Run tests with coverage report
make test-coverage# Run all checks (lint, format, typecheck, test)
make check
# Run individual checks
make lint
make format-check
make typecheck
# Auto-fix issues
make fixRun make help to see all available targets.
Run the test suite with coverage:
uv sync --dev
uv run pytestGenerate HTML coverage report:
uv run pytest --cov=whisper_dictate --cov-report=htmlFor developers and contributors:
- Architecture Documentation — System architecture diagrams, data flow, module responsibilities, and design patterns
- Build Instructions — Detailed guide for creating standalone executables
- CLAUDE.md — AI assistant context and development guidelines
- CONTRIBUTING.md — Contribution guidelines and coding standards
- CHANGELOG.md — Project history and release notes
git clone https://github.com/yourusername/whisper-dictate.git
cd whisper-dictate
uv sync
uv run dictate-guiIf you want to freeze dependency versions for reproducibility:
uv lock
uv sync --lockedBuild a standalone Windows executable using the PyInstaller spec:
# Using the Makefile (recommended)
USE_UV=1 make build-exe
# Or directly with PyInstaller
uv run pyinstaller packaging/pyinstaller/whisper_dictate_gui.spec --noconfirmThe executable will be created in dist/whisper-dictate-gui/ with all required CUDA DLLs bundled. See docs/build.md for detailed build instructions.
| Symptom | Cause | Fix |
|---|---|---|
| Hotkey not working | Registered from wrong thread | Fixed in latest build; re-register it |
cudnn_ops64_9.dll missing |
cuDNN not installed | Install cuDNN v9 and add to PATH |
int8_float16 not supported |
CPU mode only | Use --compute-type int8 |
| Nothing pastes | GUI has focus | Trigger with hotkey from target window |
| Audio errors | Mic blocked by privacy settings | Enable mic access for desktop apps |
- API keys are encrypted using Windows Credential Manager (via
keyringlibrary) - Credentials are tied to your user account and encrypted by the operating system
- API keys are never stored in plaintext JSON files
- Automatic migration: Existing plaintext API keys are automatically migrated to secure storage on first run
- When debug logging is enabled, transcribed speech and prompts are logged to disk
- Use debug mode only when troubleshooting, not for regular use
- The GUI displays a prominent warning when debug mode is active
- All transcription happens 100% locally — no cloud calls
- LLM cleanup is optional and only used if you configure an endpoint
- Your data never leaves your machine unless you explicitly enable LLM cleanup
Logs are written to both:
- Console (stderr) — for immediate feedback
- File (
~/.whisper_dictate/logs/whisper_dictate.log) — for debugging
Log levels include DEBUG, INFO, WARNING, and ERROR with timestamps and context.
Enjoy local, AI-enhanced dictation — fast, private, and cloud-free.