-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epseak phonemizer implementation #55
base: main
Are you sure you want to change the base?
Conversation
Thanks for the contribution. The currently espeak-ng actually seems work and used the one from: https://github.com/thewh1teagle/piper-rs However, I am wonder if you could help support other languages other than English. Both espeak-rs and this repo didn't actually had ability to support Chinese or Japanese. |
i've only tested with english, but i think it should work with other languages like zh/jp with a build config change, i will ping you when i finish implementing and testing :> |
thanks so much for the interest! Hoping for it could support CN with latest kokoro 1.0 model! |
i've added suppourt for g2p for chinese and japanse, currently , you can run the example in the
would you like to use the same language code as in espeak (as in |
Hi, looks like not exactly right, you can listen from here: What's the lib link to espeak differences compare with piper-rs I linked previously? I am not sure what's need further to confirm to support Chinese or Japanese. Might we need make Kokoro work as the final goal |
is there any resource i can look up to see the correct phonemes, is google translate voice sufficient for comparison to the link above?
sorry , for this i am not sure what the difference are because i just quickly wrapped the libespeak-ng in rust specifically for the g2p functionality , then just ran the model, so i am not sure of the implementation details of the other libraries (i will take a look though all of them though)
i think if the g2p part works it should work. |
Do you able to speak Japanese? Japanese could also be used to identify whether the result is OK or not. But also, a known correct Chinese phoneme and sentence pair can be used to check. I think Kokoro 1.0 should have some examples to check. |
@heabeounMKTO I have fixed it via run suggest add the |
hello, for your error, i think you need to install autotools first. |
sorry i only speak english and my native language khmer, but i will look up some example online and fix
i will check this too |
thewh1teagle/kokoro-onnx#99 (comment) the python version kokoro-onnx version supports Chinese, seems we can reference from it. (at least compare the phoneme output) |
hello, i've updated lazy_phonemizer to match the phonemes output from kokoro-onnx. from what i gathered my wrapper library is outputting more "details" to the syllables with the extra 5's and 2's so i just removed it and it matched with the
can you test with with chinese voice to confirm that it sounds correct? |
If so, then the IPA should right. Have u successfully used kokoro-onnx generated some voices? Can u attach some Chinese / Japanese voices let me have a listen, i can tell if the voices are correct or not. Once it listened correct, we can migrate to lazy_phoneme |
zh_test_zfxiaoxiao1.mp4hello , sorry for the late reply i am a bit busy, but here is the test audio for |
The voice overall is workable but sounds weired. |
zf_xiaoxiao_onnx_py.mp4this is the python version , |
Holy moly, the kokor-onnx is wrong. the Chinese is not right. Let me link a issue to it. |
But the good knows is seems rust aligned to it. So once kokoro-onnx fix the Chinese issue, we might will have a right voice for Chinese and Japanese. |
for chinese , i think the issue might be with tokenization/normalization too, and for japanese , espeak-ng doesn't work well, its a issue with espeak g2p itself.
some of the characters are fall back to either chinese or english phonemes. |
Am new sure how did Kokoro original does, it uses miskai which written by author himself. https://github.com/hexgrad/misaki He seems uses espeak backend mainly, and language support for Chinese and Japanese had some modifications. The essential alignment actually should be align lazy_g2p with misaki |
ohh, im not aware of this one, i think i'll have a look and change lazy_p accordingly! |
hello, I see the Espeak section of the phonomizer is not yet implemented, coincedentally, I was working on running kokoro with rust using onnxruntime without knowing the existence of this repo. i've written bindings for espeak-ng for grapheme to phoneme generation here , i was wondering if it's at all useful for your project.