Feature: Add SenseVoice as ASR backend — 5x faster than faster-whisper

Hi! Cool project — real-time voice assistant with WebRTC streaming.

I noticed you're using faster-whisper for ASR. Have you considered **SenseVoice** as an alternative backend? It could meaningfully reduce latency:

## Performance comparison

| Metric | faster-whisper (large-v3) | SenseVoice |
|--------|--------------------------|------------|
| Architecture | Autoregressive | **Non-autoregressive** |
| Relative speed | ~4x vs original Whisper | **~20x vs original Whisper** |
| Model size | 1.5B | **234M** |
| VRAM usage | ~4-6GB | **~1GB** |
| First-token latency | Higher (sequential decoding) | **Lower (parallel decoding)** |

For a real-time voice assistant, the non-autoregressive architecture gives much lower first-token latency — you get the full transcription in one forward pass instead of waiting for sequential token generation.

## Integration options

**Python API (drop-in):**
```python
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_array)
```

**OpenAI-compatible server (zero code change if you already use OpenAI STT API):**
```bash
pip install funasr
funasr-server --device cuda
# POST /v1/audio/transcriptions — same as OpenAI/Whisper API
```

## Links
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)
- FunASR: https://github.com/modelscope/FunASR (16.7K stars)

Happy to discuss integration details!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Add SenseVoice as ASR backend — 5x faster than faster-whisper #17

Performance comparison

Integration options

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	faster-whisper (large-v3)	SenseVoice
Architecture	Autoregressive	Non-autoregressive
Relative speed	~4x vs original Whisper	~20x vs original Whisper
Model size	1.5B	234M
VRAM usage	~4-6GB	~1GB
First-token latency	Higher (sequential decoding)	Lower (parallel decoding)

Uh oh!

Feature: Add SenseVoice as ASR backend — 5x faster than faster-whisper #17

Description

Performance comparison

Integration options

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions