Skip to content

Feature: Add SenseVoice as ASR backend — 5x faster than faster-whisper #17

Description

@LauraGPT

Hi! Cool project — real-time voice assistant with WebRTC streaming.

I noticed you're using faster-whisper for ASR. Have you considered SenseVoice as an alternative backend? It could meaningfully reduce latency:

Performance comparison

Metric faster-whisper (large-v3) SenseVoice
Architecture Autoregressive Non-autoregressive
Relative speed ~4x vs original Whisper ~20x vs original Whisper
Model size 1.5B 234M
VRAM usage ~4-6GB ~1GB
First-token latency Higher (sequential decoding) Lower (parallel decoding)

For a real-time voice assistant, the non-autoregressive architecture gives much lower first-token latency — you get the full transcription in one forward pass instead of waiting for sequential token generation.

Integration options

Python API (drop-in):

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_array)

OpenAI-compatible server (zero code change if you already use OpenAI STT API):

pip install funasr
funasr-server --device cuda
# POST /v1/audio/transcriptions — same as OpenAI/Whisper API

Links

Happy to discuss integration details!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions