Whisper Dictate

A privacy-first, local speech-to-text and AI cleanup tool for Windows.
It uses faster-whisper for offline transcription and can optionally send text to a local or remote OpenAI-compatible endpoint (such as LM Studio) for light cleanup or rewriting.
It supports a GUI, global hotkeys, and automatic pasting into the active window.

✨ Features

100% local transcription — no cloud calls
Secure credential storage — API keys encrypted in Windows Credential Manager
Optional LLM cleanup via an OpenAI-style endpoint (LM Studio, Ollama, etc.)
Glossary injection to enforce product names, jargon, or key phrases during normalization and LLM cleanup
Prompt editor (Edit → Prompt…) with your changes saved to ~/.whisper_dictate_prompt.txt
Per-application prompts so you can override the cleanup prompt per app or window title (Edit → Per-app prompts…)
Glossary editor (Edit → Glossary…) with add/edit/delete controls, CSV import/export, and entries saved to ~/.whisper_dictate/whisper_dictate_glossary.json
Saves your settings (model, device, hotkey, LLM config, paste delay) to ~/.whisper_dictate/whisper_dictate_settings.json
Global hotkey for push-to-talk from any application
Auto-paste into the focused window (Ctrl+V), with a configurable delay
Fetch available LLM models from your endpoint directly inside the LLM settings window
Floating status indicator that mirrors the app state (idle, listening, cleaning, etc.)
Reset floating status indicator button if you drag the indicator off screen
GPU or CPU execution
One-command setup using uv
Comprehensive test suite with coverage reporting
Structured logging to file and console

🚀 Quick Start

1. Clone and install

git clone https://github.com/yourusername/whisper-dictate.git
cd whisper-dictate
uv sync

2. Run (GUI)

uv run dictate-gui

The GUI provides:

Model/device selection with resource requirements displayed for each model
Auto-configured compute type based on your device (CUDA→float16, CPU→int8)
Input-device field
Optional LLM cleanup section (endpoint, model, API key, temperature, and system prompt)
Auto-paste checkbox and delay setting
Transcript view with timestamped results

Use Load model, then Register hotkey (e.g., CTRL+WIN+G), and press the hotkey anywhere to dictate. If "Auto-paste" is enabled, the result pastes automatically into the app you were using.

3. Configure (optional)

Edit → Prompt… to customize the cleanup prompt (persisted to ~/.whisper_dictate_prompt.txt).
Edit → Per-app prompts… to override the cleanup prompt for specific processes (e.g., winword.exe, notion.exe). Use the recent apps dropdown to prefill entries with the last windows you dictated into, and optionally add a window-title regex to scope a prompt to a particular document or channel. These rules are persisted to ~/.whisper_dictate/whisper_dictate_settings.json.
Edit → Glossary… to maintain glossary entries (persisted to ~/.whisper_dictate/whisper_dictate_glossary.json).
Settings → Speech recognition… to pick your device (CPU/CUDA) and model. Models display their size and resource requirements (e.g., "Small (465 MB, ~2 GB VRAM)"), and the optimal compute type is auto-configured based on your device selection.
Settings → Automation… to set the global hotkey, enable auto-paste, and tune the paste delay.
Settings → LLM cleanup… to toggle cleanup, set endpoint/model/API key, refresh available models, and adjust temperature. Use Use glossary before prompt to normalize transcripts with your glossary and prepend the rules to the LLM system prompt so it honors your terminology. All settings are saved to ~/.whisper_dictate/whisper_dictate_settings.json when you close the app.

🎯 Per-Application Prompts

Per-application prompts let you tailor cleanup instructions to the active app or window:

Open Edit → Per-app prompts….
Pick a Recent app to prefill the process name, or add a process manually.
Optionally set a Window title regex to target a specific document, chat, or channel.
Enter the Prompt override for that app/window.

When dictating, Whisper Dictate detects the active process/window and applies the most specific matching prompt before sending text to the LLM cleanup step. Clear entries to fall back to the global prompt.

🪄 Auto-Paste Behavior

When the Auto-paste checkbox (GUI) is enabled:

The final text is copied to the clipboard.
After a short delay (default 0.15 s), Ctrl + V is sent to the active window.

If you toggle recording from inside Word, Notion, VS Code, or a chat window, the cleaned text appears directly where your cursor is.

Tip: Trigger the hotkey, don't click the GUI button — clicking steals focus and will paste into the GUI itself.

If you move the floating status indicator off-screen, use Settings → Reset status indicator position to snap it back to the default location.

📒 Glossary-Driven Cleanup

Use the glossary to keep acronyms, brand names, or domain-specific terms intact during normalization and LLM cleanup:

Open Edit → Glossary… and add entries as trigger/replacement pairs using the glossary manager.
Each rule supports Match (word, phrase, regex), Case sensitive, and Whole words only to fine-tune how replacements are applied. Use Add, Edit, or Delete to maintain the list, or Import CSV / Export CSV to bulk-manage rules. An optional description can remind you why a term matters.
Entries are saved to ~/.whisper_dictate/whisper_dictate_glossary.json and loaded automatically on startup.
In Settings → LLM cleanup…, enable Use glossary before prompt to apply the glossary to transcripts and prepend the rules to the LLM system prompt so it takes priority over the general cleanup prompt.

Glossary usage is optional; turn it off from Settings → LLM cleanup… if you only want the standard prompt applied.

🧩 Project Layout

whisper-dictate/
│
├─ whisper_dictate/
│   ├─ __init__.py           # Package initialization
│   ├─ config.py             # Configuration defaults and CUDA setup
│   ├─ app_context.py        # Active-window context (process and title)
│   ├─ prompt.py             # LLM prompt management (load/save)
│   ├─ app_prompts.py        # Per-application prompt resolution helpers
│   ├─ app_prompt_dialog.py  # GUI dialog for managing per-app prompt overrides
│   ├─ audio.py              # Audio recording functionality
│   ├─ transcription.py      # Whisper transcription logic
│   ├─ llm_cleanup.py        # LLM text cleanup functionality
│   ├─ glossary.py           # Glossary persistence used during LLM cleanup
│   ├─ glossary_dialog.py    # GUI dialog for managing glossary rules
│   ├─ hotkeys.py            # Windows global hotkey management
│   ├─ gui_components.py     # Reusable GUI components
│   ├─ logging_config.py     # Centralized logging setup
│   ├─ settings_store.py     # Persistent settings load/save helpers
│   └─ gui.py                # Main GUI application
│
├─ tests/                    # Comprehensive test suite
│   ├─ test_app_context.py
│   ├─ test_app_prompts.py
│   ├─ test_config.py
│   ├─ test_prompt.py
│   ├─ test_hotkeys.py
│   ├─ test_llm_cleanup.py
│   ├─ test_glossary.py
│   ├─ test_transcription.py
│   └─ test_audio.py
│
├─ packaging/
│   └─ pyinstaller/
│       └─ whisper_dictate_gui.spec  # PyInstaller build spec
│
├─ pyproject.toml
└─ README.md

🧪 Development

Running Tests

# Run all tests
make test

# Run tests with coverage report
make test-coverage

Code Quality Checks

# Run all checks (lint, format, typecheck, test)
make check

# Run individual checks
make lint
make format-check
make typecheck

# Auto-fix issues
make fix

Available Make Targets

Run make help to see all available targets.

Testing

Run the test suite with coverage:

uv sync --dev
uv run pytest

Generate HTML coverage report:

uv run pytest --cov=whisper_dictate --cov-report=html

📚 Documentation

For developers and contributors:

Architecture Documentation — System architecture diagrams, data flow, module responsibilities, and design patterns
Build Instructions — Detailed guide for creating standalone executables
CLAUDE.md — AI assistant context and development guidelines
CONTRIBUTING.md — Contribution guidelines and coding standards
CHANGELOG.md — Project history and release notes

🧾 Deployment on Another Machine

git clone https://github.com/yourusername/whisper-dictate.git
cd whisper-dictate
uv sync
uv run dictate-gui

If you want to freeze dependency versions for reproducibility:

uv lock
uv sync --locked

📦 Create an EXE

Build a standalone Windows executable using the PyInstaller spec:

# Using the Makefile (recommended)
USE_UV=1 make build-exe

# Or directly with PyInstaller
uv run pyinstaller packaging/pyinstaller/whisper_dictate_gui.spec --noconfirm

The executable will be created in dist/whisper-dictate-gui/ with all required CUDA DLLs bundled. See docs/build.md for detailed build instructions.

🛠 Troubleshooting

Symptom	Cause	Fix
Hotkey not working	Registered from wrong thread	Fixed in latest build; re-register it
`cudnn_ops64_9.dll missing`	cuDNN not installed	Install cuDNN v9 and add to PATH
`int8_float16 not supported`	CPU mode only	Use `--compute-type int8`
Nothing pastes	GUI has focus	Trigger with hotkey from target window
Audio errors	Mic blocked by privacy settings	Enable mic access for desktop apps

🔒 Security & Privacy

Credential Storage

API keys are encrypted using Windows Credential Manager (via keyring library)
Credentials are tied to your user account and encrypted by the operating system
API keys are never stored in plaintext JSON files
Automatic migration: Existing plaintext API keys are automatically migrated to secure storage on first run

Debug Mode Warning

When debug logging is enabled, transcribed speech and prompts are logged to disk
Use debug mode only when troubleshooting, not for regular use
The GUI displays a prominent warning when debug mode is active

Privacy-First Design

All transcription happens 100% locally — no cloud calls
LLM cleanup is optional and only used if you configure an endpoint
Your data never leaves your machine unless you explicitly enable LLM cleanup

📝 Logging

Logs are written to both:

Console (stderr) — for immediate feedback
File (~/.whisper_dictate/logs/whisper_dictate.log) — for debugging

Log levels include DEBUG, INFO, WARNING, and ERROR with timestamps and context.

Enjoy local, AI-enhanced dictation — fast, private, and cloud-free.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
docs		docs
packaging/pyinstaller		packaging/pyinstaller
scripts		scripts
tests		tests
whisper_dictate		whisper_dictate
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.MD		CLAUDE.MD
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
commit.txt		commit.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Dictate

✨ Features

🚀 Quick Start

1. Clone and install

2. Run (GUI)

3. Configure (optional)

🎯 Per-Application Prompts

🪄 Auto-Paste Behavior

📒 Glossary-Driven Cleanup

🧩 Project Layout

🧪 Development

Running Tests

Code Quality Checks

Available Make Targets

Testing

📚 Documentation

🧾 Deployment on Another Machine

📦 Create an EXE

🛠 Troubleshooting

🔒 Security & Privacy

Credential Storage

Debug Mode Warning

Privacy-First Design

📝 Logging

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictate

✨ Features

🚀 Quick Start

1. Clone and install

2. Run (GUI)

3. Configure (optional)

🎯 Per-Application Prompts

🪄 Auto-Paste Behavior

📒 Glossary-Driven Cleanup

🧩 Project Layout

🧪 Development

Running Tests

Code Quality Checks

Available Make Targets

Testing

📚 Documentation

🧾 Deployment on Another Machine

📦 Create an EXE

🛠 Troubleshooting

🔒 Security & Privacy

Credential Storage

Debug Mode Warning

Privacy-First Design

📝 Logging

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages