Transcribe Me is a CLI-driven Python application that transcribes audio files using either the OpenAI Whisper API or AssemblyAI.
graph TD
A[Load Config] --> B[Get Audio Files]
B --> C{Audio File Exists?}
C --Yes--> D{Use AssemblyAI?}
D --Yes--> E[Transcribe with AssemblyAI]
D --No--> F[Transcribe with OpenAI]
E --> G[Generate Additional Outputs]
F --> I[Save Transcription]
G --> I
I --> K[Clean Up Temporary Files]
K --> B
C --No--> L[Print Warning]
L --> B
Starting from version 1.0.0, you need to explicitly install the provider(s) you want to use. The package no longer installs all providers by default to reduce unnecessary dependencies. This helps keep your environment lean by only installing what you actually need.
- Audio Transcription: Transcribes audio files using either the OpenAI Whisper API or AssemblyAI. It supports both MP3 and M4A formats.
- AssemblyAI Features: When using AssemblyAI, provides additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
- Supports Audio Files: Supports audio files in
.m4a
and.mp3
formats. - Supports Docker: Can be run in a Docker container for easy deployment and reproducibility.
Tool has been tested with Python 3.12.
This has been tested with macOS, your mileage may vary on other operating systems like Windows, WSL or Linux.
-
Install Python. Recommended way is to use asdf:
brew install asdf asdf plugin add python asdf install python 3.12.0 asdf global python 3.12.0
-
Install FFmpeg using Homebrew:
brew install ffmpeg
-
Install the application using pip. You'll need to specify which provider(s) you want to use:
-
For OpenAI only:
pip install "transcribe-me[openai]"
-
For AssemblyAI only:
pip install "transcribe-me[assemblyai]"
-
For all providers:
pip install "transcribe-me[all]"
Or if you're installing from source:
# Clone the repository git clone https://github.com/echohello-dev/transcribe-me.git cd transcribe-me # Install with the desired providers pip install -e ".[openai]" # For OpenAI # or pip install -e ".[assemblyai]" # For AssemblyAI # or pip install -e ".[all]" # For all providers
-
-
Bootstrap your current directory with the configuration file:
transcribe-me install
This command will create a
.transcribe.yaml
file in your current directory and prompt you to enter your API keys for OpenAI and AssemblyAI if they are not already provided in environment variables. -
Set up your API keys (if not already done during installation):
# For OpenAI export OPENAI_API_KEY=your_openai_api_key # For AssemblyAI export ASSEMBLYAI_API_KEY=your_assemblyai_api_key
-
Place your audio files (mp3 or m4a format) in the
input
directory (or any directory specified in your configuration). -
Run the application:
transcribe-me
The application will process each audio file in the input directory and save the transcriptions to the output directory.
-
(Optional) Archive processed files after transcription:
transcribe-me archive
When running Transcribe Me, the provider used for transcription is determined by your configuration file. By default, OpenAI is used, but you can switch to AssemblyAI by setting use_assemblyai: true
in your .transcribe.yaml
file.
Make sure you've installed the appropriate provider package as described in the installation section. If you try to use a provider that isn't installed, you'll receive a helpful error message with instructions on how to install the missing dependency.
The transcribe-me
command supports several options:
# Display help information
transcribe-me --help
# Specify a custom configuration file
transcribe-me --config /path/to/custom/config.yaml
# Run in verbose mode for detailed output
transcribe-me --verbose
# Run in debug mode for even more detailed logging
transcribe-me --debug
The .transcribe.yaml
file controls the behavior of the application. Here's a comprehensive example with all available options:
# Transcription service selection
use_assemblyai: false # Set to true to use AssemblyAI instead of OpenAI
# Folder Configuration
input_folder: input # Directory containing audio files to transcribe
output_folder: output # Directory where transcriptions will be saved
archive_folder: archive # Directory for archived files (optional)
# AssemblyAI-specific options (when use_assemblyai is true)
assemblyai_options:
speech_model: nano # Options: base, nano, large
speaker_labels: true # Enable speaker diarization
summarization: true # Generate summary
sentiment_analysis: true # Generate sentiment analysis
iab_categories: true # Generate topic detection
# OpenAI-specific options (when use_assemblyai is false)
openai_options:
model: whisper-1 # Whisper model to use
Process only specific audio files:
# Transcribe a single file
transcribe-me --file path/to/your/audio.mp3
# Transcribe multiple files
transcribe-me --files file1.mp3,file2.mp3
You can specify custom output formats in your configuration:
output_format:
include_timestamps: true # Include timestamps in transcription
include_speakers: true # Include speaker labels (AssemblyAI only)
text_only: false # Output only plain text (no JSON)
For large audio files, the application automatically splits them into smaller chunks for processing with OpenAI:
splitting_options:
chunk_size_seconds: 600 # Split files into 10-minute chunks
overlap_seconds: 5 # 5-second overlap between chunks
You can also run the application using Docker. The Docker image comes with all providers pre-installed. If you're building your own Docker image, you can choose which providers to include.
-
Install Docker on your machine by following the instructions on the Docker website.
-
Pull the pre-built image:
docker pull ghcr.io/echohello-dev/transcribe-me:latest
Or build your own image with specific providers:
FROM python:3.12-slim # Install FFmpeg RUN apt-get update && apt-get install -y ffmpeg # Copy the application code COPY . /app WORKDIR /app # Install the package with the desired providers # Choose one of the following: RUN pip install -e ".[openai]" # For OpenAI only # RUN pip install -e ".[assemblyai]" # For AssemblyAI only # RUN pip install -e ".[all]" # For all providers ENTRYPOINT ["transcribe-me"]
-
Create a
.transcribe.yaml
configuration file:touch .transcribe.yaml docker run \ --rm \ -v $(pwd)/.transcribe.yaml:/app/.transcribe.yaml \ ghcr.io/echohello-dev/transcribe-me:latest install
-
Run the following command to run the application in Docker:
docker run \ --rm \ -e OPENAI_API_KEY \ -e ASSEMBLYAI_API_KEY \ -v $(pwd)/archive:/app/archive \ -v $(pwd)/input:/app/input \ -v $(pwd)/output:/app/output \ -v $(pwd)/.transcribe.yaml:/app/.transcribe.yaml \ ghcr.io/echohello-dev/transcribe-me:latest
This command mounts the
input
andoutput
directories and the.transcribe.yaml
configuration file into the Docker container. -
(Optional) We can also run the application using the provided
docker-compose.yml
file:version: '3' services: transcribe-me: image: ghcr.io/echohello-dev/transcribe-me:latest environment: - OPENAI_API_KEY - ASSEMBLYAI_API_KEY volumes: - ./input:/app/input - ./output:/app/output - ./archive:/app/archive - ./.transcribe.yaml:/app/.transcribe.yaml
Run the following command to start the application using Docker Compose:
docker compose run --rm transcribe-me
This command mounts the
input
,output
,archive
, and.transcribe.yaml
configuration file into the Docker container. Seecompose.example.yaml
for an example configuration.Make sure to replace
OPENAI_API_KEY
andASSEMBLYAI_API_KEY
with your actual API keys. Also make sure to create the.transcribe.yaml
configuration file in the same directory as thedocker-compose.yml
file.
The Transcribe Me application follows a straightforward workflow:
- Load Configuration: The application loads the configuration from the
.transcribe.yaml
file, which includes settings for input/output directories and transcription service. - Get Audio Files: The application gets a list of audio files from the input directory specified in the configuration.
- Check Existing Transcriptions: For each audio file, the application checks if there is an existing transcription file. If a transcription file exists, it skips to the next audio file.
- Transcribe Audio File: If no transcription file exists, the application transcribes the audio file using either the OpenAI Whisper API or AssemblyAI, based on the configuration.
- Generate Outputs:
- For OpenAI: The application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
- For AssemblyAI: The application generates additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
- Save Transcription and Outputs: The application saves the transcription and all generated outputs to separate files in the output directory.
- Clean Up Temporary Files: The application removes any temporary files generated during the transcription process.
- Repeat: The process repeats for each audio file in the input directory.
The application uses a configuration file (.transcribe.yaml
) to specify settings such as input/output directories, API keys, models, and their configurations. The configuration file is created automatically when you run the transcribe-me install
command.
Here is an example configuration file:
use_assemblyai: false # Set to true to use AssemblyAI instead of OpenAI for transcription
input_folder: input
output_folder: output
freeze
: Saves the installed Python package versions to therequirements.txt
file.install-cli
: Installs the application as a command-line interface (CLI) tool.
- The application requires API keys for both OpenAI and Anthropic. These keys are not provided with the application and must be obtained separately.
- The application is designed to run on a single machine and does not support distributed processing. As a result, the speed of transcription and summary generation is limited by the performance of the machine it is running on.
- The application does not support real-time transcription or summary generation. It processes audio files one at a time and must complete the transcription and summary generation for each file before moving on to the next one.
-
Clone the repository.
-
Install the required tools using ASDF (for managing tool versions) and Homebrew (for installing dependencies):
- Install ASDF:
brew install asdf
- Install FFmpeg using Homebrew:
brew install ffmpeg
-
Install the package with pip. You can choose which providers to install:
-
For OpenAI only:
pip install -e ".[openai]"
-
For AssemblyAI only:
pip install -e ".[assemblyai]"
-
For all providers:
pip install -e ".[all]"
Or using uvx:
-
For OpenAI only:
uvx install -e ".[openai]"
-
For AssemblyAI only:
uvx install -e ".[assemblyai]"
-
For all providers:
uvx install -e ".[all]"
-
-
Install the Python dependencies and create a virtual environment:
make install
-
Run the
transcribe-me install
command to create the.transcribe.yaml
configuration file and provide your API keys for OpenAI and AssemblyAI:make transcribe-install
-
(Optional) Install the application as a command-line interface (CLI) tool:
make install-cli