END to END Automatic Speech Recognition

This project implements three different end-to-end Automatic Speech Recognition (ASR) architectures using PyTorch.

The ASR models are based on the following architectures:

CTC¹
Listen, Attend and Spell (LAS)²
LAS-CTC³

The models were trained and tested on a subset of the HarperValleyBank Dataset⁴, which is hosted here. The dataset is used to train models that predict each spoken character.

Highlights

Feature Extraction

Uses Librosa to extract WAV log melspectrogram
Character encoding

Training End-to-End ASR

Multiple implementations of ASR model architectures, including attention-based models
Regularization of attention-based networks to respect CTC alignments (LAS-CTC)
Utilizes Lightning Trainer API
Training process logs and visualizations with Wandb
Teacher-forcing

Decoding

Greedy decoding
Imposes a CTC objective on the decoding
CTC-Rules

Getting Started

Download and unzip the dataset:

unzip harper_valley_bank_minified.zip -d data

Model Run Report

Model run report obtained from Wandb

Reference

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
Listen, Attend and Spell, W Chan et al.
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.
CS224S: Spoken Language Processing

This README provides an overview of the project, highlighting its main features, the technology stack, and usage instructions. For more detailed documentation, please refer to the project files and comments within the code.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
run_inference.py		run_inference.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

END to END Automatic Speech Recognition

Highlights

Feature Extraction

Training End-to-End ASR

Decoding

Getting Started

Model Run Report

Reference

About

Releases

Packages

Languages

License

thetosy/E2E-ASR

Folders and files

Latest commit

History

Repository files navigation

END to END Automatic Speech Recognition

Highlights

Feature Extraction

Training End-to-End ASR

Decoding

Getting Started

Model Run Report

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages