@@ -129,7 +129,7 @@ To **enable Speaker Diarization**, include your Hugging Face access token (read)
129
129
130
130
Run whisper on example segment (using default params, whisper small) add ` --highlight_words True ` to visualise word timings in the .srt file.
131
131
132
- whisperx examples/sample01 .wav
132
+ whisperx path/to/audio .wav
133
133
134
134
135
135
Result using * WhisperX* with forced alignment to wav2vec2.0 large:
@@ -143,16 +143,16 @@ https://user-images.githubusercontent.com/36994049/207743923-b4f0d537-29ae-4be2-
143
143
144
144
For increased timestamp accuracy, at the cost of higher gpu mem, use bigger models (bigger alignment model not found to be that helpful, see paper) e.g.
145
145
146
- whisperx examples/sample01 .wav --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --batch_size 4
146
+ whisperx path/to/audio .wav --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --batch_size 4
147
147
148
148
149
149
To label the transcript with speaker ID's (set number of speakers if known e.g. ` --min_speakers 2 ` ` --max_speakers 2 ` ):
150
150
151
- whisperx examples/sample01 .wav --model large-v2 --diarize --highlight_words True
151
+ whisperx path/to/audio .wav --model large-v2 --diarize --highlight_words True
152
152
153
153
To run on CPU instead of GPU (and for running on Mac OS X):
154
154
155
- whisperx examples/sample01 .wav --compute_type int8
155
+ whisperx path/to/audio .wav --compute_type int8
156
156
157
157
### Other languages
158
158
@@ -163,7 +163,7 @@ Currently default models provided for `{en, fr, de, es, it}` via torchaudio pipe
163
163
164
164
165
165
#### E.g. German
166
- whisperx --model large-v2 --language de examples/sample_de_01 .wav
166
+ whisperx --model large-v2 --language de path/to/audio .wav
167
167
168
168
https://user-images.githubusercontent.com/36994049/208298811-e36002ba-3698-4731-97d4-0aebd07e0eb3.mov
169
169
0 commit comments