Skip to content

An intelligent voice transcription input tool supporting multiple transcription services and high-quality speech recognition features.

Notifications You must be signed in to change notification settings

Mor-Li/Whisper-Input-Next

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Whisper-Input-Next - Enhanced Voice Transcription Tool

Project Poster

Version Python License Chinese Documentation

An intelligent voice transcription input tool supporting multiple transcription services and high-quality speech recognition features.

πŸš€ Project Background

This project is based on ErlichLiu/Whisper-Input for secondary development. The original project has been inactive for months, so we have made extensive feature expansions and architectural optimizations, adding important features like OpenAI GPT-4o transcribe integration, audio archiving, local whisper support, and more. Why use this project?

✨ Key Features

🎯 Core Functions

  • Multi-platform Transcription Services: Support for OpenAI GPT-4o transcribe, GROQ, SiliconFlow, local whisper.cpp
  • Smart Hotkeys: Ctrl+F (OpenAI high-quality) / Ctrl+I (local cost-saving mode)
  • Audio Archive: Automatically save all recordings, support history playback
  • Failure Retry: Intelligent error handling and retry mechanism

πŸ”§ Technical Features

  • Dual Processor Architecture: OpenAI + Local processors working simultaneously
  • 180s Long Audio Support: Support up to 3 minutes of continuous recording
  • Smart Status Indicators: Simple numeric status display (0, 1, !)
  • Cache System: Audio archive with transcription result caching

🌟 User Experience

  • No Clipboard Pollution: Clean status display without interfering with system clipboard
  • One-click Retry: Failed transcriptions can be retried without re-recording
  • Real-time Input: Transcription results appear directly at cursor position
  • Privacy Protection: Local processing option, data not uploaded

πŸ“¦ Quick Start

Environment Requirements

  • Python 3.12+
  • macOS/Linux (Windows support in development)
  • Network connection (only required for cloud services)
  • Local whisper.cpp (required when using local transcription features)

Installation Steps

  1. Clone Project
git clone https://github.com/Mor-Li/Whisper-Input-Next.git
cd Whisper-Input-Next
  1. Create Virtual Environment
python -m .venv .venv
source .venv/bin/activate  # macOS/Linux
# or .venv\\Scripts\\activate  # Windows
  1. Install Dependencies
pip install -r requirements.txt
  1. Install Local whisper.cpp (Optional, required for local transcription)
# Clone whisper.cpp repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Compile (macOS/Linux)
make

# Download model file (recommend large-v3)
bash ./models/download-ggml-model.sh large-v3

# Record whisper-cli path for later configuration in .env file
echo "Whisper CLI Path: $(pwd)/build/bin/whisper-cli"
cd ..
  1. Configure Environment Variables
cp env.example .env
# Edit .env file, configure necessary parameters:
# - OFFICIAL_OPENAI_API_KEY: OpenAI API key (required)
# - WHISPER_CLI_PATH: whisper.cpp executable path (required for local transcription)
# - WHISPER_MODEL_PATH: whisper model file path (required for local transcription)
  1. Run Program
python main.py
# or use startup script
chmod +x start.sh
./start.sh

⚠️ Important Notes

Required Configuration:

  • OFFICIAL_OPENAI_API_KEY: OpenAI GPT-4o transcribe API key
  • WHISPER_CLI_PATH: Local whisper.cpp executable absolute path
  • WHISPER_MODEL_PATH: whisper model file path (relative to whisper.cpp root directory)

whisper.cpp Installation Guide:

  1. Clone and compile from whisper.cpp repository
  2. Download large-v3 model: bash ./models/download-ggml-model.sh large-v3
  3. Configure correct paths in .env

βš™οΈ Configuration Guide

Environment Variable Configuration

Configure the following parameters in the .env file:

# Service platform selection (recommend using our maintained dual-platform configuration)
SERVICE_PLATFORM=openai&local  # Our primarily maintained configuration

# OpenAI configuration (required)
OFFICIAL_OPENAI_API_KEY=sk-proj-xxx

# Local whisper.cpp configuration (required for local transcription)
WHISPER_CLI_PATH=/path/to/whisper.cpp/build/bin/whisper-cli
WHISPER_MODEL_PATH=models/ggml-large-v3.bin

# Keyboard shortcut configuration
TRANSCRIPTIONS_BUTTON=f
TRANSLATIONS_BUTTON=ctrl
SYSTEM_PLATFORM=mac  # mac/win

# Feature switches
CONVERT_TO_SIMPLIFIED=false
ADD_SYMBOL=false
OPTIMIZE_RESULT=false

Important Note:

  • This project primarily maintains SERVICE_PLATFORM=openai&local configuration
  • This is our recommended and most thoroughly tested configuration
  • Other single-platform configurations (groq, siliconflow, etc.) are maintained for compatibility only

Quick Start with Aliases (Recommended)

Add these aliases to your shell profile (~/.bashrc, ~/.zshrc, etc.):

alias whisper_input='cd /path/to/Whisper-Input-Next && ./start.sh'
alias whisper_input_off='tmux kill-session -t whisper-input'

Replace /path/to/Whisper-Input-Next with your actual project path.

Hotkey Instructions

Hotkey Function Service Features
Ctrl+F High-quality transcription OpenAI GPT-4o transcribe Built-in punctuation, highest quality
Ctrl+I Local transcription whisper.cpp Offline processing, privacy protection

Status Indicators

The program displays concise status indicators at the cursor position during runtime:

Status Meaning Action
0 Recording Press hotkey again to stop recording
1 Transcribing Please wait for transcription to complete
! Transcription failed/error Press Ctrl+F again to retry (audio saved)

Design Optimizations:

  • Use concise numeric status, avoid complex emoji symbols
  • No system clipboard pollution, display only at cursor position
  • Clear and intuitive status, easy to quickly identify

Retry Mechanism Instructions:

  • When transcription fails, the system saves the recording and displays ! status
  • No need to re-record, simply press Ctrl+F to retry
  • Retry uses previously saved audio until transcription succeeds

πŸ“š Feature Documentation

πŸ› οΈ Development Status

βœ… Completed Features

  • OpenAI GPT-4o transcribe integration
  • Audio archive system
  • Local whisper support
  • Dual processor architecture
  • Smart retry mechanism
  • Project documentation improvement
  • 10-minute recording limit protection
  • Status indicator delay optimization
  • Audio format conversion support (m4a to wav)
  • Bilingual documentation system
  • GPT-4o terminology standardization

🚧 In Development

No features currently in development

πŸ“‹ Planned Features

No features currently planned

πŸ§ͺ Experimental Features History

iOS Keyboard Extension Experiment (August 14, 2025)

Status: ❌ Discontinued due to Apple's restrictions
Attempted to create iOS keyboard extension but discovered that even Sogou Input Method cannot directly record audio in keyboard extensions due to Apple's system limitations. iOS voice input is currently not feasible as a seamless keyboard extension.

🀝 Contributing Guidelines

We welcome all forms of contributions! Whether it's:

  • πŸ› Bug Reports: Found an issue? Create an Issue
  • πŸ’‘ Feature Suggestions: Have great ideas? Start a Discussion
  • πŸ“ Code Contributions: Submit Pull Requests
  • πŸ“š Documentation Improvements: Help improve documentation
  • 🌍 Translations: Help translate to more languages

Development Environment Setup

# Clone repository
git clone https://github.com/Mor-Li/Whisper-Input-Next.git
cd Whisper-Input-Next

# Create development environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start development
python main.py

πŸ™ Acknowledgments

  • Thanks to ErlichLiu/Whisper-Input for the original project foundation
  • Thanks to OpenAI for providing excellent transcription API services
  • Thanks to whisper.cpp community for local processing support
  • Thanks to all contributors and users for their support

πŸ“ž Contact Information


⭐ If this project helps you, please give it a Star for support!

About

An intelligent voice transcription input tool supporting multiple transcription services and high-quality speech recognition features.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.2%
  • Shell 1.8%