Skip to content

odunola499/BEST-RQ

Repository files navigation

Conformer Encoder Pretraining with Best-RQ Algorithm

This repository provides the code to pretrain a 120M Conformer encoder using the Best-RQ algorithm. The Best-RQ approach offers a simpler and more compute-efficient pretraining strategy for audio speech language models compared to contrastive learning methods like wav2vec2. The training code is built using PyTorch Lightning and is designed to be easily extensible for different datasets and configurations.

Features

  • Pretraining Approach: Leverages the Best-RQ algorithm for pretraining.
  • Multilingual LibriSpeech: Trains on the Multilingual LibriSpeech dataset downloaded from Hugging Face by default.
  • Customizable Datasets: To train on a different dataset, simply modify the dataset.py file.
  • Logging: Uses Weights & Biases (WandB) for experiment tracking. The training script prompts you with the necessary setup steps.

Installation and Usage

  1. Clone this repository:
    git clone https://github.com/odunola499/BEST-RQ-Algorithm.git
    cd BEST-RQ-Algorithm
  2. Run the training script:
    pip install -r requirements.txt
    python3 train.py
    

Model Architecture

The Conformer model in this repository features relational positional encoding attention, powered by the latest FlexAttention API from PyTorch 2.5.1, replacing traditional absolute positional encodings and vanilla attention. This implementation includes modifications to the standard Conformer model, but future improvements aim to explore newer advancements like grouped query attention.

Future Work

  • Conformer Architecture Revisions: Rewriting and refining the Conformer model, drawing inspiration from more recent research like Efficient Conformer.
  • Pretraining an RNN-T ASR Model: Initializing with the pretrained encoder and training a Conformer RNN-T model, inspired by AssemblyAI's research.
  • Best-RQ Algorithm Article: An in-depth article on the Best-RQ algorithm is in progress.

References

  • Self-Supervised Learning with Random-Projection Quantizer for Speech Recognition
    Link to Paper
  • Conformer: Convolution-augmented Transformer for Speech Recognition
    Link to Paper
  • FlexAttention in PyTorch
    Link to Blog

Questions?

If you have any questions or need further clarification, please feel free to reach out!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages