A powerful, self-hostable transcription solution designed for small to medium-sized businesses (SMBs), teams and individuals who need full control over their data and transcription workflow.
Transcriber Platform turns audio into accurate organized text through a user-friendly web interface. Upload audio files and get transcriptions from top-tier APIs like OpenAI GPT-4o Transcribe, OpenAI Whisper and AssemblyAI Universal (from AssemblyAI). It intelligently handles large files, supports single and multi-user modes and includes powerful administrative tools for managing users, costs and custom AI workflows.
- โจ Key Features
- ๐ Quick Start (Docker)
- ๐ง Installation & Configuration
- ๐ป Usage Guide
- ๐ ๏ธ For Developers
- ๐ค Troubleshooting
- ๐ License
- Multiple Transcription APIs: Choose from OpenAI GPT-4o Transcribe, OpenAI Whisper or AssemblyAI Universal.
- Speaker Diarization (AssemblyAI): Toggle speaker labels to identify who said what on supported jobs.
- Large File Handling: Automatically splits files over 25MB into chunks for seamless processing.
- AI-Powered Title Generation: Automatically generates a concise title for each transcription.
- Custom AI Workflows: Execute custom prompts (ex. summarize, extract action items) on transcribed text using LLMs like OpenAI models or Google Gemini; save reusable workflows from the UI.
- Flexible Language Options: Select the audio language manually or use automatic detection.
- Context Prompting: Improve accuracy for jargon or specific names by providing context hints to OpenAI models.
- Intuitive Web Interface: Clean and simple UI for uploading files, managing history and running workflows.
- Live Progress & Cancellation: Track uploads/transcriptions with live updates and cancel long-running jobs without leaving the page.
- Comprehensive History: View, copy, download (.txt) and delete past transcriptions.
- Internationalization (i1n): Multi-language support (English, Spanish, French, Dutch).
- Dual Deployment Modes:
single: Simple, no-login mode using global API keys. Perfect for personal use.multi: Full-featured user mode with registration, login and individual API key management.
- Secure User Authentication: Supports username/password, Google Sign-In and password resets.
- Role-Based Access Control (RBAC): Granularly control permissions for features, API usage and more.
- Smart API Key Handling: If a user has permission to manage keys, their personal key is used. Otherwise, the system seamlessly falls back to the global API key, ensuring uninterrupted service.
- Comprehensive Admin Panel:
- User Management: View and manage all users and their usage.
- Cost & Usage Analytics: Detailed dashboards to track transcription minutes, workflow costs and API expenses by user and role.
- System-wide Templates: Create and manage workflow templates available to all users.
Get the platform running in under 5 minutes. This is the recommended method.
Prerequisites: Docker and Docker Compose.
-
Clone the Repository
git clone https://github.com/arnoulddw/transcriber-platform.git cd transcriber-platform -
Configure Your Environment Copy the example environment file and edit it with your details.
cp .env.example .env nano .env
- Crucially, you must set:
SECRET_KEY, your API keys (OPENAI_API_KEY, etc.) andMYSQL_PASSWORD,MYSQL_USER,MYSQL_DB. - For multi-user mode, also set
ADMIN_USERNAMEandADMIN_PASSWORDto create your admin account.
- Crucially, you must set:
-
Build and Run
docker-compose up -d --build
-
Access the App Open your browser and go to
http://localhost:5004(or theAPP_PORTyou set in.env). The database will be initialized automatically on the first run.
This section provides more detailed setup instructions.
- API Keys: You need API keys for the services you plan to use:
- OpenAI (for Whisper, GPT-4o Transcribe and LLM workflows)
- AssemblyAI (Universal model)
- Google Gemini (for title generation and LLM workflows)
- Docker & Docker Compose: Required for the recommended installation method.
- Google Client ID (Optional): Required for Google Sign-In in
multiuser mode. - Python 3.9+: Required for local development without Docker.
The application is configured using environment variables in a .env file. The table below lists all available options.
Click to expand all environment variables
| Variable | Description | Default |
|---|---|---|
| Core Application | ||
SECRET_KEY |
CRITICAL: A strong, random key for session security. Must be set. | (none) |
DEPLOYMENT_MODE |
single (no login) or multi (user accounts). |
multi |
TZ |
Timezone for the application (ex. UTC, Europe/Paris). |
UTC |
APP_PORT |
Port on which the app is accessible on the host machine. | 5004 |
LOG_LEVEL |
Application logging level (DEBUG, INFO, WARNING, ERROR). |
INFO |
| API Keys (Global Fallback) | ||
OPENAI_API_KEY |
Your API key for OpenAI (Whisper, GPT-4o Transcribe, LLMs). | (none) |
ASSEMBLYAI_API_KEY |
Your API key for AssemblyAI. | (none) |
GEMINI_API_KEY |
Your API key for Google Gemini (Title Generation, LLMs). | (none) |
| Default Settings | ||
DEFAULT_TRANSCRIPTION_PROVIDER |
Default transcription API on load (gpt-4o-transcribe, whisper, assemblyai). |
gpt-4o-transcribe |
DEFAULT_LLM_PROVIDER |
Default LLM for tasks like title generation (openai, gemini). |
gemini |
DEFAULT_LANGUAGE |
Default transcription language on load (auto, en, es, etc.). |
auto |
SUPPORTED_LANGUAGE_CODES |
Comma-separated language codes to show in the UI (ex. en,nl,fr,es). |
en,nl,fr,es |
| Database (MySQL) | ||
MYSQL_HOST |
Hostname for the MySQL server. Use mysql for Docker Compose. |
localhost |
MYSQL_PORT |
Port for the MySQL server. | 3306 |
MYSQL_USER |
Username for MySQL connection. Must be set. | (none) |
MYSQL_PASSWORD |
Password for MySQL connection. Must be set. | (none) |
MYSQL_DB |
Name of the MySQL database. Must be set. | (none) |
MYSQL_ROOT_PASSWORD |
Root password for the MySQL service (used by Docker Compose). | (none) |
MYSQL_HOST_PORT |
Host port to map to MySQL's internal port (for external access). | 3307 |
MYSQL_POOL_SIZE |
Number of connections in the MySQL connection pool. | 10 |
| Multi-User Mode | ||
ADMIN_USERNAME |
Username for the initial admin account (created on first run). | admin |
ADMIN_PASSWORD |
Password for the initial admin account. Must be set for admin creation. | (none) |
ADMIN_EMAIL |
Email for the initial admin account. | (none) |
GOOGLE_CLIENT_ID |
Your Google OAuth 2.0 Client ID for Google Sign-In. | (none) |
| Email (for Password Resets) | ||
MAIL_SERVER |
SMTP server for sending emails. | (none) |
MAIL_PORT |
SMTP server port. | 587 |
MAIL_USE_TLS |
Whether to use TLS for SMTP (true, false). |
true |
MAIL_USERNAME |
Username for SMTP authentication. | (none) |
MAIL_PASSWORD |
Password or App Password for SMTP authentication. | (none) |
MAIL_DEFAULT_SENDER |
Default sender email address (ex. noreply@example.com). |
noreply@example.com |
| Advanced Configuration | ||
TRANSCRIPTION_WORKERS |
Number of parallel workers for chunked transcription. | 4 |
WORKFLOW_RATE_LIMIT |
Rate limit for workflow API calls per user (ex. 10 per hour). |
10 per hour |
PHYSICAL_DELETION_DAYS |
Days after soft-deletion before a transcription is permanently removed. | 120 |
Click to see alternative installation methods (Docker Hub, Local Development)
- Create a
.envfile on your host machine with all necessary variables. EnsureMYSQL_HOSTpoints to your accessible MySQL server. - Pull the Docker Image:
docker pull arnoulddw/transcriber-platform:latest
- Run the Docker Container:
docker run -d -p 5004:5004 \ --env-file ./.env \ --name transcriber-platform-app \ arnoulddw/transcriber-platform:latest
- Clone the repository and
cdinto it. - Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate # On macOS/Linux
- Install dependencies:
pip install -r requirements.txt
- Set up MySQL: Ensure you have a running MySQL server. Create a database and user.
- Configure
.env: Create the file and add yourSECRET_KEY, API keys and local MySQL connection details (MYSQL_HOST=localhost, etc.). - Initialize the Database:
export FLASK_APP=app flask init-db flask create-roles flask create-admin # If in multi-mode
- Run the App:
flask run --host=0.0.0.0 --port=5004
- Access the Application: Open the application in your web browser.
- Authentication (Multi-User Mode):
- Register for an account or log in.
- Navigate to "Manage API Keys" to add your personal API keys for OpenAI, AssemblyAI, etc. This is required for most features.
- Upload Audio: Click the "File" button to select an audio file.
- Configure Transcription:
- Select your preferred API (GPT-4o Transcribe, Whisper, AssemblyAI, etc.).
- Choose the audio language or leave it on "Automatic Detection."
- (Optional) Provide a context prompt to improve accuracy.
- (Optional) Enable speaker diarization when using AssemblyAI to label speakers in the transcript.
- Transcribe: Click the "Transcribe" button.
- Manage History: Your completed transcriptions will appear in the history panel. From there you can:
- View, copy or download the text.
- Delete old transcriptions.
- Run an AI workflow (ex. summarize) on the text.
You can trigger transcriptions programmatically using your personal API key (generate it in Manage API Keys โ Public API Access). Requests run with your default transcription model and language and the results land in your normal history.
curl -X POST https://your-domain.example.com/api/v1/transcribe \
-H "Authorization: Bearer <YOUR_USER_API_KEY>" \
-F "audio_file=@/path/to/audio.wav"Use your deploymentโs base URL (or http://localhost:5004 in local dev). The API responds with a job_id you can poll via /api/progress/<job_id> while signed in. Keep your API key secret; rotate it anytime from the same modal.
Database migrations are handled automatically by the application on startup; no manual migration commands are required.
To add or update UI translations:
- Extract strings from the code to a template file:
pybabel extract -F babel.cfg -k lazy_gettext -o messages.pot . - Update language files with the new strings:
pybabel update -i messages.pot -d app/translations
- Edit the
.pofiles (ex.app/translations/es/LC_MESSAGES/messages.po) to add the new translations. - Compile the translations into binary files the app can use:
pybabel compile -d app/translations
- Port in use: Change
APP_PORTin.envand restart. If using Docker Compose, you can also change the host port indocker-compose.yml(ex."5005:5004"). - MySQL Connection Issues (Docker): Ensure the
mysqlservice is running (docker-compose ps). Check logs withdocker-compose logs mysql. VerifyMYSQL_HOSTis set tomysqlin your.envfile. - API Key Issues: In
singlemode, double-check the global API keys in.env. Inmultimode, ensure the logged-in user has added their keys correctly in the UI. - Google Sign-In Errors: Verify your
GOOGLE_CLIENT_IDis correct and that your Google Cloud Project has the correct "Authorized JavaScript origins" (ex.http://localhost:5004) and "Redirect URIs".
This project is licensed under the MIT License. See the LICENSE file for details.
