-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe #624
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still doesn't work, there's no transcript.
Also there's this warning and I am not sure if it is something to be worried about?
backend/routers/transcribe.py:102: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:1530.)
samples = torch.frombuffer(decoded_opus, dtype=torch.int16).float() / 32768.0
That is the error that is related how I should handle the buffer in Opus, and I'll solve that soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It does transcribes, but the problem is it misses a lot of segments (way more than pcm with vad).
- The websocket disconnects more frequently
- Also the transcription is quite slow for both pcm and opus
Sounds like server get heavier. My solution :
Any feedback or opinion is appreciated. Thanks! |
Changes
|
@josancamon19 @mdmohsin7 Already merged with main branch, giving better result on case of using speech profile. Please review, thanks! |
https://share.icloud.com/photos/06dFrjm9Q_RrsvZO5VLScWGLg Clearly doesn't work, for next review, please submit videos of it working through the app |
@josancamon19 @mdmohsin7 Drive link: https://drive.google.com/drive/folders/1h1nbyLAaVt72Wwy-yO_5C8L5_re17ptI?usp=sharing Please review thanks! |
I have added more testing, which now is for a lecture video (more convertation alike situation) in "test 1" folder. also provided the pcm transcribe from playstore app (no VAD) for the ground truth. The result is, the latency is indistinguishable, accuracy very improved. VAD opus usable |
dude @josancamon19 |
Moving PR to #922 |
#518
The encoding in Friend firmware code v1.0 shows that it's using frame size of 160 (10ms). I have not tested on Friend cause I don't have the device.
Changing the real-time resolution to standard to 20ms, should theoretically reduce server load.
Thank you!