Open
Description
Description
We will want to add support for audio datasets into GuideLLM to enable the benchmarking of audio multi-modal models like whisper. Since Audio datasets are fairly limited, we should aim to structure the data by use cases in a way where developers can easily understand the context of the data and what is being benchmarked by the model.
User Story
As a developer, I want to benchmark a whisper model so that I can understand performance before I move to production with different audio dataset profiles to make sure my use case (call-center summarization, translation, etc. ) can be met on my target hardware.
Acceptance Criteria
- Enable support for the leading hugging face supported audio datasets: https://huggingface.co/blog/audio-datasets#a-tour-of-audio-datasets-on-the-hub
- Create profiles for the datasets for different use cases in structured folders for the use cases
-
- Miltilingual Language Translation
-
- English Speech Recognition
-
- Speech Translation
-
- Audio Classification
-
- TBD
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog