AMSR
video-1737110239209.webm
Digital Human
output2_added_subtitle.mp4
Give a star β if you like it!
Kokoro is a trending top 2 TTS model on huggingface.
This repo provides insanely fast Kokoro infer in Rust, you can now have your built TTS engine powered by Kokoro and infer fast by only a command of koko
.
kokoros
is a rust
crate that provides easy to use TTS ability.
One can directly call koko
in terminal to synthesize audio.
kokoros
uses a relative small model 87M params, while results in extremly good quality voices results.
Languge support:
- English;
- Chinese (partly);
- Japanese (partly);
- German (partly);
π₯π₯π₯π₯π₯π₯π₯π₯π₯ Kokoros Rust version just got a lot attention now. If you also interested in insanely fast inference, embeded build, wasm support etc, please star this repo! We are keep updating it.
New Discord community: https://discord.gg/E566zfDWqD, Please join us if you interested in Rust Kokoro.
2025.01.22
: π₯π₯π₯ Streaming mode supported. You can now using--stream
to have fun with stream mode, kudos to mroigo;2025.01.17
: π₯π₯π₯ Style mixing supported! Now, listen the output AMSR effect by simply specific style:af_sky.4+af_nicole.5
;2025.01.15
: OpenAI compatible server supported, openai format still under polish!2025.01.15
: Phonemizer supported! NowKokoros
can inference E2E without anyother dependencies! Kudos to @tstm;2025.01.13
: Espeak-ng tokenizer and phonemizer supported! Kudos to @mindreframer ;2025.01.12
: ReleasedKokoros
;
- Install required Python packages:
pip install -r scripts/requirements.txt
- Initialize voice data:
python scripts/fetch_voices.py
This step fetches the required voices.json
data file, which is necessary for voice synthesis.
- Build the project:
cargo build --release
./target/release/koko -h
./target/release/koko text "Hello, this is a TTS test"
The generated audio will be saved to tmp/output.wav
by default. You can customize the save location with the --output
or -o
option:
./target/release/koko text "I hope you're having a great day today!" --output greeting.wav
./target/release/koko file poem.txt
For a file with 3 lines of text, by default, speech audio files tmp/output_0.wav
, tmp/output_1.wav
, tmp/output_2.wav
will be outputted. You can customize the save location with the --output
or -o
option, using {line}
as the line number:
./target/release/koko file lyrics.txt -o "song/lyric_{line}.wav"
- Start the server:
./target/release/koko openai
- Make API requests using either curl or Python:
Using curl:
curl -X POST http://localhost:3000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "anything can go here",
"input": "Hello, this is a test of the Kokoro TTS system!",
"voice": "af_sky"
}'
--output sky-says-hello.wav
Using Python:
python scripts/run_openai.py
The stream
option will start the program, reading for lines of input from stdin and outputting WAV audio to stdout.
Use it in conjunction with piping.
./target/release/koko stream > live-audio.wav
# Start typing some text to generate speech for and hit enter to submit
# Speech will append to `live-audio.wav` as it is generated
# Hit Ctrl D to exit
echo "Suppose some other program was outputting lines of text" | ./target/release/koko stream > programmatic-audio.wav
- Build the image
docker build -t kokoros .
- Run the image, passing options as described above
# Basic text to speech
docker run -v ./tmp:/app/tmp kokoros text "Hello from docker!" -o tmp/hello.wav
# An OpenAI server (with appropriately bound port)
docker run -p 3000:3000 kokoros openai
Due to Kokoro actually not finalizing it's ability, this repo will keep tracking the status of Kokoro, and helpfully we can have language support incuding: English, Mandarin, Japanese, German, French etc.
Copyright reserved by Lucas Jin under Apache License.