Skip to content

A mod for WotR that introduces text to speech in various parts of the game utilizing Auralis/XTTSv2, Kokoro, Windows Natural Voices, and Apple Speech

License

Notifications You must be signed in to change notification settings

lvaskys/WotR-API-TextToSpeechMod

 
 

Repository files navigation

WotR-API-TextToSpeechMod

By lvaskys

Fork of PathfinderTextToSpeechMod

This is based on PathfinderTextToSpeechMod and currently preserves all its functionality and adds the ability to use a backend API for TTS instead of the Windows TTS engine. Currently, Auralis (based on xttsv2) and Kokoro-FastAPI are supported. I believe an NVIDIA gpu is required for both, but see their documentation for more information.

See SpeechMod-README.md for the original README.md on how to install windows tts natural voices (if desired) and the basic functionality of the mod.

How to use

This mod's main features are configured in a settings.json file that lives in the base mod folder. Comments are included to help guide your configuration. Of note is speech_impl which defines the implementation of the speech service to use, either AuralisSpeech or KokoroSpeech (this new implementation), or WindowsSpeech or AppleSpeech for the original implementation.

The API service must be up and running for the mod to work. See the documentation for the API service you are using for more information on how to set it up. I used WSL to run Auralis, although it may work in native Windows now as well, I'm not sure. For Kokoro, I used the docker-run instructions. I think docker on Windows may require WSL for proper sharing of gpu to the container, so you may need to install it either way.

Make sure the endpoint matches. If you are keeping the settings.json file as is, then for Auralis:

auralis.openai --host 127.0.0.1 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 4 --vllm_logging_level warn

or Kokoro:

docker run --gpus all -p 8000:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2

Note: in order to use Auralis, you must provide a wav file for server to use for one-shot voice cloning. Currently, this is set up to live in your base game directory, not your mod directory. Although perhaps that can be fixed in the future. An example file you can use is female_01.wav.

Other new features

This supports cancelling playback with the controller cancel/B/Circle button. Specifically, it will cancel the current sentence or two sentence chunk being played and continue with the next sentence. This allows for a kind of "fast-forward" type effect if you don't feel like listening to the entire dialogue, but still want to hear later portions. Like, for example, if your reading outpaces the speaker.

Multilingual Support

This should theoretically be multi-lingual, but is untested. Both XTTS and Kokoro support multiple languages.

Linux/WINE Support

I believe some people have wanted a version of this mod for Linux. I have not tested this on Linux/WINE, but Auralis or Kokoro should theoretically work.

Limitations/Broken Features

These new features only support one speaker at the moment. Both voiced and narrator content will be spoken with the chosen voice. I may fully implement male/female/narrator as it was in the original mod, or maybe even characters-specific voices. But as it stands for now, this is a good initial release and works fine for my own needs. However, Windows/Apple implementation should work as intended.

Motivation and thoughts

Windows natural TTS voices are pretty good, but lack proper cadence and emotion. Also, I did not want to pay for API, but rather have it run locally. XTTS is excellent in that regard, and seems to pick up on cues without even feeding it any additional information. The sound quality is poorer, however, and it is a good bit slower, but still responsive enough for my needs. Kororo is another TTS I heard about, and decided to add it as an option as well for another alternative. It's super fast, many times more than realtime, and the quality is excellent. The cadence and emotion aren't super, though, and seem rather similar to Windows natural voices.

About

A mod for WotR that introduces text to speech in various parts of the game utilizing Auralis/XTTSv2, Kokoro, Windows Natural Voices, and Apple Speech

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C# 94.7%
  • C++ 5.1%
  • C 0.2%