Overview

This project aims to fine-tune an open-source Whisper model for (Thai!) speech to text task on open source Whisper model.

Whisper is a state-of-the-art transformer model that can transcribe speech signals into text with high accuracy and low latency. We will use the huggingface's whisper implementation to fine-tune the model on our own GPU infrastructure, using a various custom dataset of audio recordings and transcripts.

We will also monitor the training process and evaluate the model performance with tensorboard, a visualization tool for machine learning experiments.

The tools used in this repository for finetuning can be described below:

Setup dev environment

poetry env use python3.10 poetry update poetry install poetry run pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
data		data
docs		docs
notebooks		notebooks
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
catalog-info.yml		catalog-info.yml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Setup dev environment

About

Releases

Packages

Contributors 2

Languages

thinkingmachines/speechtotext-poc

Folders and files

Latest commit

History

Repository files navigation

Overview

Setup dev environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages