Adding realtime diarization to collabora/WhisperLive #178

yehiaabdelm · 2023-10-10T13:36:40Z

yehiaabdelm
Oct 10, 2023

I'm trying to add diarization to this repo https://github.com/collabora/WhisperLive, which has transcription and also runs a VAD model before passing audio data to the transcriber. I had it working with pyannote-audio, however, the VAD model and the diarization model both run on the CPU so they slow down each other. I was also passing the whole audio file every time to the model so this is obviously not optimal. I was wondering how I can use diart instead of pyannote. Most of the examples I see are directly from microphone. Can anyone please share an example of how I can use it diart with the data being a float 32 numpy array of mono audio instead of a stream from the microphone? Any help is appreciated.

juanmc2005 · 2023-10-10T14:48:34Z

juanmc2005
Oct 10, 2023
Maintainer

Hi @yehiaabdelm, apart from MicrophoneAudioSource, diart also provides FileAudioSource. If you've already loaded audio as a numpy array, the usage will depend on whether the array is a chunk or the entire recording. Diart pipelines and blocks are designed to receive one chunk at a time.

To combine diart with Whisper you can check this article that I wrote on Medium some time ago. It will give you a head start but I'm sure many improvements can be made.

You can also check out this gist for the diart+whisper code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding realtime diarization to collabora/WhisperLive #178

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Adding realtime diarization to collabora/WhisperLive #178

yehiaabdelm Oct 10, 2023

Replies: 1 comment

juanmc2005 Oct 10, 2023 Maintainer

yehiaabdelm
Oct 10, 2023

juanmc2005
Oct 10, 2023
Maintainer