Conformer Encoder Pretraining with Best-RQ Algorithm

This repository provides the code to pretrain a 120M Conformer encoder using the Best-RQ algorithm. The Best-RQ approach offers a simpler and more compute-efficient pretraining strategy for audio speech language models compared to contrastive learning methods like wav2vec2. The training code is built using PyTorch Lightning and is designed to be easily extensible for different datasets and configurations.

Features

Pretraining Approach: Leverages the Best-RQ algorithm for pretraining.
Multilingual LibriSpeech: Trains on the Multilingual LibriSpeech dataset downloaded from Hugging Face by default.
Customizable Datasets: To train on a different dataset, simply modify the dataset.py file.
Logging: Uses Weights & Biases (WandB) for experiment tracking. The training script prompts you with the necessary setup steps.

Installation and Usage

Clone this repository:

git clone https://github.com/odunola499/BEST-RQ-Algorithm.git
cd BEST-RQ-Algorithm

Run the training script:

pip install -r requirements.txt
python3 train.py

Model Architecture

The Conformer model in this repository features relational positional encoding attention, powered by the latest FlexAttention API from PyTorch 2.5.1, replacing traditional absolute positional encodings and vanilla attention. This implementation includes modifications to the standard Conformer model, but future improvements aim to explore newer advancements like grouped query attention.

Future Work

Conformer Architecture Revisions: Rewriting and refining the Conformer model, drawing inspiration from more recent research like Efficient Conformer.
Pretraining an RNN-T ASR Model: Initializing with the pretrained encoder and training a Conformer RNN-T model, inspired by AssemblyAI's research.
Best-RQ Algorithm Article: An in-depth article on the Best-RQ algorithm is in progress.

References

Self-Supervised Learning with Random-Projection Quantizer for Speech Recognition
Link to Paper
Conformer: Convolution-augmented Transformer for Speech Recognition
Link to Paper
FlexAttention in PyTorch
Link to Blog

Questions?

If you have any questions or need further clarification, please feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
best_rq		best_rq
encoder		encoder
config.yml		config.yml
dataset.py		dataset.py
feature_extractor.py		feature_extractor.py
model.py		model.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conformer Encoder Pretraining with Best-RQ Algorithm

Features

Installation and Usage

Model Architecture

Future Work

References

Questions?

About

Releases

Packages

Languages

odunola499/BEST-RQ

Folders and files

Latest commit

History

Repository files navigation

Conformer Encoder Pretraining with Best-RQ Algorithm

Features

Installation and Usage

Model Architecture

Future Work

References

Questions?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages