Hi! Cool project — real-time voice assistant with WebRTC streaming.
I noticed you're using faster-whisper for ASR. Have you considered SenseVoice as an alternative backend? It could meaningfully reduce latency:
Performance comparison
| Metric |
faster-whisper (large-v3) |
SenseVoice |
| Architecture |
Autoregressive |
Non-autoregressive |
| Relative speed |
~4x vs original Whisper |
~20x vs original Whisper |
| Model size |
1.5B |
234M |
| VRAM usage |
~4-6GB |
~1GB |
| First-token latency |
Higher (sequential decoding) |
Lower (parallel decoding) |
For a real-time voice assistant, the non-autoregressive architecture gives much lower first-token latency — you get the full transcription in one forward pass instead of waiting for sequential token generation.
Integration options
Python API (drop-in):
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_array)
OpenAI-compatible server (zero code change if you already use OpenAI STT API):
pip install funasr
funasr-server --device cuda
# POST /v1/audio/transcriptions — same as OpenAI/Whisper API
Links
Happy to discuss integration details!
Hi! Cool project — real-time voice assistant with WebRTC streaming.
I noticed you're using faster-whisper for ASR. Have you considered SenseVoice as an alternative backend? It could meaningfully reduce latency:
Performance comparison
For a real-time voice assistant, the non-autoregressive architecture gives much lower first-token latency — you get the full transcription in one forward pass instead of waiting for sequential token generation.
Integration options
Python API (drop-in):
OpenAI-compatible server (zero code change if you already use OpenAI STT API):
pip install funasr funasr-server --device cuda # POST /v1/audio/transcriptions — same as OpenAI/Whisper APILinks
Happy to discuss integration details!