XLSR-Transducer: Streaming ASR for Estonian

This repository implements a streaming Automatic Speech Recognition (ASR) system based on the paper XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models.

Overview

The implementation uses Facebook's XLSR-53 model as the encoder part of a Transducer architecture, enabling streaming ASR capabilities. Key features include:

Streaming ASR with low latency
Transducer-based architecture for efficient inference
Support for Estonian language ASR
Attention masking patterns for streaming capability
Attention sinks to reduce left context requirements

Project Structure

.
├── README.md
├── requirements.txt
├── config/
│   └── config.yaml
├── src/
│   ├── data/
│   │   ├── dataset.py
│   │   └── processor.py
│   ├── model/
│   │   ├── encoder.py
│   │   ├── predictor.py
│   │   ├── joint.py
│   │   └── transducer.py
│   ├── training/
│   │   ├── trainer.py
│   │   └── loss.py
│   └── utils/
│       ├── metrics.py
│       └── audio.py
├── scripts/
│   ├── train.py
│   ├── evaluate.py
│   └── transcribe.py
└── notebooks/
    └── demo.ipynb

Installation

pip install -r requirements.txt

Usage

Data Preparation

The model expects data in the following format:

path_to_audio_file|transcription|speaker_id

Training

python scripts/train.py

Evaluation

python scripts/evaluate.py --checkpoint path/to/checkpoint

Transcription

python scripts/transcribe.py --audio path/to/audio.wav --checkpoint path/to/checkpoint

Citation

@article{xlsrtransducer2023,
  title={XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models},
  author={...},
  journal={arXiv preprint arXiv:2407.04439},
  year={2023}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
config		config
data		data
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
README_SERVER.md		README_SERVER.md
README_TRAINING_STAGES.md		README_TRAINING_STAGES.md
SERVER_TRAINING_SUMMARY.md		SERVER_TRAINING_SUMMARY.md
check_rnnt_impl.py		check_rnnt_impl.py
check_vocab.py		check_vocab.py
create_bpe_tokenizer.py		create_bpe_tokenizer.py
requirements.txt		requirements.txt
run_training.py		run_training.py
setup.py		setup.py
setup_gpu.py		setup_gpu.py
test_bpe_tokenizer.py		test_bpe_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XLSR-Transducer: Streaming ASR for Estonian

Overview

Project Structure

Installation

Usage

Data Preparation

Training

Evaluation

Transcription

Citation

License

About

Releases

Packages

Languages

Siim/transcriber

Folders and files

Latest commit

History

Repository files navigation

XLSR-Transducer: Streaming ASR for Estonian

Overview

Project Structure

Installation

Usage

Data Preparation

Training

Evaluation

Transcription

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages