An intelligent voice transcription input tool supporting multiple transcription services and high-quality speech recognition features.
This project is based on ErlichLiu/Whisper-Input for secondary development. The original project has been inactive for months, so we have made extensive feature expansions and architectural optimizations, adding important features like OpenAI GPT-4o transcribe integration, audio archiving, local whisper support, and more. Why use this project?
- Multi-platform Transcription Services: Support for OpenAI GPT-4o transcribe, GROQ, SiliconFlow, local whisper.cpp
- Smart Hotkeys: Ctrl+F (OpenAI high-quality) / Ctrl+I (local cost-saving mode)
- Audio Archive: Automatically save all recordings, support history playback
- Failure Retry: Intelligent error handling and retry mechanism
- Dual Processor Architecture: OpenAI + Local processors working simultaneously
- 180s Long Audio Support: Support up to 3 minutes of continuous recording
- Smart Status Indicators: Simple numeric status display (0, 1, !)
- Cache System: Audio archive with transcription result caching
- No Clipboard Pollution: Clean status display without interfering with system clipboard
- One-click Retry: Failed transcriptions can be retried without re-recording
- Real-time Input: Transcription results appear directly at cursor position
- Privacy Protection: Local processing option, data not uploaded
- Python 3.12+
- macOS/Linux (Windows support in development)
- Network connection (only required for cloud services)
- Local whisper.cpp (required when using local transcription features)
- Clone Project
git clone https://github.com/Mor-Li/Whisper-Input-Next.git
cd Whisper-Input-Next- Create Virtual Environment
python -m .venv .venv
source .venv/bin/activate # macOS/Linux
# or .venv\\Scripts\\activate # Windows- Install Dependencies
pip install -r requirements.txt- Install Local whisper.cpp (Optional, required for local transcription)
# Clone whisper.cpp repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
# Compile (macOS/Linux)
make
# Download model file (recommend large-v3)
bash ./models/download-ggml-model.sh large-v3
# Record whisper-cli path for later configuration in .env file
echo "Whisper CLI Path: $(pwd)/build/bin/whisper-cli"
cd ..- Configure Environment Variables
cp env.example .env
# Edit .env file, configure necessary parameters:
# - OFFICIAL_OPENAI_API_KEY: OpenAI API key (required)
# - WHISPER_CLI_PATH: whisper.cpp executable path (required for local transcription)
# - WHISPER_MODEL_PATH: whisper model file path (required for local transcription)- Run Program
python main.py
# or use startup script
chmod +x start.sh
./start.shRequired Configuration:
OFFICIAL_OPENAI_API_KEY: OpenAI GPT-4o transcribe API keyWHISPER_CLI_PATH: Local whisper.cpp executable absolute pathWHISPER_MODEL_PATH: whisper model file path (relative to whisper.cpp root directory)
whisper.cpp Installation Guide:
- Clone and compile from whisper.cpp repository
- Download large-v3 model:
bash ./models/download-ggml-model.sh large-v3 - Configure correct paths in .env
Configure the following parameters in the .env file:
# Service platform selection (recommend using our maintained dual-platform configuration)
SERVICE_PLATFORM=openai&local # Our primarily maintained configuration
# OpenAI configuration (required)
OFFICIAL_OPENAI_API_KEY=sk-proj-xxx
# Local whisper.cpp configuration (required for local transcription)
WHISPER_CLI_PATH=/path/to/whisper.cpp/build/bin/whisper-cli
WHISPER_MODEL_PATH=models/ggml-large-v3.bin
# Keyboard shortcut configuration
TRANSCRIPTIONS_BUTTON=f
TRANSLATIONS_BUTTON=ctrl
SYSTEM_PLATFORM=mac # mac/win
# Feature switches
CONVERT_TO_SIMPLIFIED=false
ADD_SYMBOL=false
OPTIMIZE_RESULT=falseImportant Note:
- This project primarily maintains
SERVICE_PLATFORM=openai&localconfiguration - This is our recommended and most thoroughly tested configuration
- Other single-platform configurations (groq, siliconflow, etc.) are maintained for compatibility only
Add these aliases to your shell profile (~/.bashrc, ~/.zshrc, etc.):
alias whisper_input='cd /path/to/Whisper-Input-Next && ./start.sh'
alias whisper_input_off='tmux kill-session -t whisper-input'Replace /path/to/Whisper-Input-Next with your actual project path.
| Hotkey | Function | Service | Features |
|---|---|---|---|
Ctrl+F |
High-quality transcription | OpenAI GPT-4o transcribe | Built-in punctuation, highest quality |
Ctrl+I |
Local transcription | whisper.cpp | Offline processing, privacy protection |
The program displays concise status indicators at the cursor position during runtime:
| Status | Meaning | Action |
|---|---|---|
0 |
Recording | Press hotkey again to stop recording |
1 |
Transcribing | Please wait for transcription to complete |
! |
Transcription failed/error | Press Ctrl+F again to retry (audio saved) |
Design Optimizations:
- Use concise numeric status, avoid complex emoji symbols
- No system clipboard pollution, display only at cursor position
- Clear and intuitive status, easy to quickly identify
Retry Mechanism Instructions:
- When transcription fails, the system saves the recording and displays
!status - No need to re-record, simply press
Ctrl+Fto retry - Retry uses previously saved audio until transcription succeeds
- π Audio Archive Feature - Introduced in v3.0.0
- π€ Kimi Polish Integration - Deprecated
- π Status Display Improvements - Introduced in v3.0.0
- π Branch Differences Comparison - Introduced in v3.0.0
- π Version Control Documentation - Established in v3.0.0
- OpenAI GPT-4o transcribe integration
- Audio archive system
- Local whisper support
- Dual processor architecture
- Smart retry mechanism
- Project documentation improvement
- 10-minute recording limit protection
- Status indicator delay optimization
- Audio format conversion support (m4a to wav)
- Bilingual documentation system
- GPT-4o terminology standardization
No features currently in development
No features currently planned
Status: β Discontinued due to Apple's restrictions
Attempted to create iOS keyboard extension but discovered that even Sogou Input Method cannot directly record audio in keyboard extensions due to Apple's system limitations. iOS voice input is currently not feasible as a seamless keyboard extension.
We welcome all forms of contributions! Whether it's:
- π Bug Reports: Found an issue? Create an Issue
- π‘ Feature Suggestions: Have great ideas? Start a Discussion
- π Code Contributions: Submit Pull Requests
- π Documentation Improvements: Help improve documentation
- π Translations: Help translate to more languages
# Clone repository
git clone https://github.com/Mor-Li/Whisper-Input-Next.git
cd Whisper-Input-Next
# Create development environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Start development
python main.py- Thanks to ErlichLiu/Whisper-Input for the original project foundation
- Thanks to OpenAI for providing excellent transcription API services
- Thanks to whisper.cpp community for local processing support
- Thanks to all contributors and users for their support
- Project Address: https://github.com/Mor-Li/Whisper-Input-Next
- Issue Reports: Issues
- Feature Suggestions: Discussions
β If this project helps you, please give it a Star for support!
