Skip to content

thetosy/E2E-ASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

END to END Automatic Speech Recognition

This project implements three different end-to-end Automatic Speech Recognition (ASR) architectures using PyTorch.

The ASR models are based on the following architectures:

  • CTC1
  • Listen, Attend and Spell (LAS)2
  • LAS-CTC3

The models were trained and tested on a subset of the HarperValleyBank Dataset4, which is hosted here. The dataset is used to train models that predict each spoken character.

Highlights

Feature Extraction

  • Uses Librosa to extract WAV log melspectrogram
  • Character encoding

Training End-to-End ASR

  • Multiple implementations of ASR model architectures, including attention-based models
  • Regularization of attention-based networks to respect CTC alignments (LAS-CTC)
  • Utilizes Lightning Trainer API
  • Training process logs and visualizations with Wandb
  • Teacher-forcing

Decoding

  • Greedy decoding
  • Imposes a CTC objective on the decoding
  • CTC-Rules

Getting Started

  1. Download and unzip the dataset:
    unzip harper_valley_bank_minified.zip -d data

Model Run Report

ASR_model_report.png

Model run report obtained from Wandb

Reference

  1. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
  2. Listen, Attend and Spell, W Chan et al.
  3. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.
  4. CS224S: Spoken Language Processing

This README provides an overview of the project, highlighting its main features, the technology stack, and usage instructions. For more detailed documentation, please refer to the project files and comments within the code.

About

Implementation of end-to-end ASR architectures.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages