This project provides a GPU-first Gradio application that performs:
- French audio diarization using pyannote.
- Automatic speech translation from French speech to English text with NVIDIA Canary.
- Role/name attribution using the LFM model without relying on predefined role lists.
- High quality German translation using the same LFM model.
- Structured exports including JSON and plain text transcripts.
- Fully modular Python package under
app/for easier maintenance. - Automatic model caching inside the repository (configurable via environment variables).
- Speaker aliasing backed by LLM reasoning with evidence validation.
- Robust chunking and overlap handling for arbitrarily long audio files.
- GPU-friendly defaults (automatic bf16/float16 when available).
- Clean transcript formatting with timestamps.
- Export a valid Hugging Face token:
export HUGGINGFACE_TOKEN=.... - Install dependencies listed in your environment (pyannote, nemo, transformers, gradio, etc.).
- Launch the Gradio interface:
python main.py. - Upload a French audio/video file and wait for diarization, transcription, aliasing and translation.
All intermediate artefacts are written to temporary directories that are automatically cleaned.