Transcription Pipeline

This project provides a GPU-first Gradio application that performs:

French audio diarization using pyannote.
Automatic speech translation from French speech to English text with NVIDIA Canary.
Role/name attribution using the LFM model without relying on predefined role lists.
High quality German translation using the same LFM model.
Structured exports including JSON and plain text transcripts.

Features

Fully modular Python package under app/ for easier maintenance.
Automatic model caching inside the repository (configurable via environment variables).
Speaker aliasing backed by LLM reasoning with evidence validation.
Robust chunking and overlap handling for arbitrarily long audio files.
GPU-friendly defaults (automatic bf16/float16 when available).
Clean transcript formatting with timestamps.

Export a valid Hugging Face token: export HUGGINGFACE_TOKEN=....
Install dependencies listed in your environment (pyannote, nemo, transformers, gradio, etc.).
Launch the Gradio interface: python main.py.
Upload a French audio/video file and wait for diarization, transcription, aliasing and translation.

All intermediate artefacts are written to temporary directories that are automatically cleaned.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
main.py		main.py
readme.md		readme.md