You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 18, 2026. It is now read-only.
I'm working on implementing this into a real-time system. I notice that the model performs very well for full audio clips, but I'm struggling to get good results for a real-time use case.
It works just well enough for me to mostly rule out it being a bug on my end. What is the preferred strategy for real-time. I've been adding a delay, running inference on a full long window (like 2 seconds of audio) and taking the visemes from delay_sample from the end.
So [prev_audio...current_frame...future_audio/delay].