Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epseak phonemizer implementation #55

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

heabeounMKTO
Copy link

@heabeounMKTO heabeounMKTO commented Feb 8, 2025

hello, I see the Espeak section of the phonomizer is not yet implemented, coincedentally, I was working on running kokoro with rust using onnxruntime without knowing the existence of this repo. i've written bindings for espeak-ng for grapheme to phoneme generation here , i was wondering if it's at all useful for your project.

@lucasjinreal
Copy link
Owner

Thanks for the contribution.

The currently espeak-ng actually seems work and used the one from: https://github.com/thewh1teagle/piper-rs

However, I am wonder if you could help support other languages other than English.

Both espeak-rs and this repo didn't actually had ability to support Chinese or Japanese.

@heabeounMKTO
Copy link
Author

Thanks for the contribution.

The currently espeak-ng actually seems work and used the one from: https://github.com/thewh1teagle/piper-rs

However, I am wonder if you could help support other languages other than English.

Both espeak-rs and this repo didn't actually had ability to support Chinese or Japanese.

i've only tested with english, but i think it should work with other languages like zh/jp with a build config change, i will ping you when i finish implementing and testing :>

@lucasjinreal
Copy link
Owner

thanks so much for the interest! Hoping for it could support CN with latest kokoro 1.0 model!

@heabeounMKTO
Copy link
Author

heabeounMKTO commented Feb 8, 2025

thanks so much for the interest! Hoping for it could support CN with latest kokoro 1.0 model!

i've added suppourt for g2p for chinese and japanse, currently , you can run the example in the lazy_phonememize repo and get phonemes for chinese like so:

❯ cargo run --example lp_cli --release -- --input-text "这是一篇懒惰的文字" --lang cmn
   Compiling lazy_phonememize v0.1.2 (/Users/aa/Documents/lazy_phonememize)
    Finished `release` profile [optimized] target(s) in 13.57s
     Running `target/release/examples/lp_cli --input-text '这是一篇懒惰的文字' --lang cmn`
[DEBUG] [LazyPhonemizer] `lazy_p buffer len 72
INPUT_TEXT: 这是一篇懒惰的文字
PHONEMIZED: ts.ˈo-5 s.ˈi.5 ji5phˈiɛ5n lˈa2n tˈuo5 tˈə1 wˈuəɜn tsˈi̪5

would you like to use the same language code as in espeak (as in cmn for chinese - mandarin) or would you like a something else ,for reference here's the full list that's suppourted, please also confirm that the phonemes are correct because i do not speak chinese 😅, thank you !

@lucasjinreal
Copy link
Owner

Hi, looks like not exactly right, you can listen from here:

https://ipa-reader.com/

What's the lib link to espeak differences compare with piper-rs I linked previously?

I am not sure what's need further to confirm to support Chinese or Japanese. Might we need make Kokoro work as the final goal

@heabeounMKTO
Copy link
Author

Hi, looks like not exactly right, you can listen from here:
https://ipa-reader.com/

is there any resource i can look up to see the correct phonemes, is google translate voice sufficient for comparison to the link above?

What's the lib link to espeak differences compare with piper-rs I linked previously?

sorry , for this i am not sure what the difference are because i just quickly wrapped the libespeak-ng in rust specifically for the g2p functionality , then just ran the model, so i am not sure of the implementation details of the other libraries (i will take a look though all of them though)

I am not sure what's need further to confirm to support Chinese or Japanese. Might we need make Kokoro work as the final goal

i think if the g2p part works it should work.

@lucasjinreal
Copy link
Owner

Do you able to speak Japanese? Japanese could also be used to identify whether the result is OK or not.

But also, a known correct Chinese phoneme and sentence pair can be used to check.

I think Kokoro 1.0 should have some examples to check.

@shanzhengliu
Copy link
Contributor

shanzhengliu commented Feb 9, 2025

@heabeounMKTO
looks like the your library import is failed in the project?
image

I have fixed it via run brew install automake

suggest add the automake as dependency in Readme

@heabeounMKTO
Copy link
Author

@heabeounMKTO looks like the your library import is failed in the project? image

hello, for your error, i think you need to install autotools first.
please make sure all the build dependencies are met , you can view them here,

@heabeounMKTO
Copy link
Author

Do you able to speak Japanese? Japanese could also be used to identify whether the result is OK or not.
But also, a known correct Chinese phoneme and sentence pair can be used to check.

sorry i only speak english and my native language khmer, but i will look up some example online and fix

I think Kokoro 1.0 should have some examples to check.

i will check this too

@lucasjinreal
Copy link
Owner

thewh1teagle/kokoro-onnx#99 (comment)

the python version kokoro-onnx version supports Chinese, seems we can reference from it. (at least compare the phoneme output)

@heabeounMKTO
Copy link
Author

heabeounMKTO commented Feb 10, 2025

thewh1teagle/kokoro-onnx#99 (comment)

the python version kokoro-onnx version supports Chinese, seems we can reference from it. (at least compare the phoneme output)

hello, i've updated lazy_phonemizer to match the phonemes output from kokoro-onnx. from what i gathered my wrapper library is outputting more "details" to the syllables with the extra 5's and 2's so i just removed it and it matched with the kokoro-onnx python version.

output from `lazy_phonemizer`
[DEBUG] [LazyPhonemizer] `lazy_p buffer len 72
INPUT_TEXT: 这是一篇懒惰的文字
PHONEMIZED: ts.ˈo s.ˈi. jiphˈiɛn lˈan tˈuo tˈə wˈuəɜn tsˈi

output from kokoro-onnx:

DEBUG [__init__.py:84] [DEBUG] phonemes ts. ˈo s. ˈi. jiphˈiɛn lˈan tˈuo tˈə wˈuəɜn tsˈi

can you test with with chinese voice to confirm that it sounds correct?

@lucasjinreal
Copy link
Owner

If so, then the IPA should right.

Have u successfully used kokoro-onnx generated some voices? Can u attach some Chinese / Japanese voices let me have a listen, i can tell if the voices are correct or not.

Once it listened correct, we can migrate to lazy_phoneme

@heabeounMKTO
Copy link
Author

heabeounMKTO commented Feb 13, 2025

If so, then the IPA should right.

Have u successfully used kokoro-onnx generated some voices? Can u attach some Chinese / Japanese voices let me have a listen, i can tell if the voices are correct or not.

Once it listened correct, we can migrate to lazy_phoneme

zh_test_zfxiaoxiao1.mp4

hello , sorry for the late reply i am a bit busy, but here is the test audio for
"这是一个懒惰的测试" , i am using zf_xiaoxiao voice, from the rust implementation.

@lucasjinreal
Copy link
Owner

The voice overall is workable but sounds weired.
Can u attach the one the kokoro-onnx generated as well?

@heabeounMKTO
Copy link
Author

The voice overall is workable but sounds weired. Can u attach the one the kokoro-onnx generated as well?

zf_xiaoxiao_onnx_py.mp4

this is the python version ,

@lucasjinreal
Copy link
Owner

Holy moly, the kokor-onnx is wrong.

the Chinese is not right. Let me link a issue to it.

@lucasjinreal
Copy link
Owner

But the good knows is seems rust aligned to it. So once kokoro-onnx fix the Chinese issue, we might will have a right voice for Chinese and Japanese.

@heabeounMKTO
Copy link
Author

heabeounMKTO commented Feb 13, 2025

But the good knows is seems rust aligned to it. So once kokoro-onnx fix the Chinese issue, we might will have a right voice for Chinese and Japanese.

for chinese , i think the issue might be with tokenization/normalization too, and for japanese , espeak-ng doesn't work well, its a issue with espeak g2p itself.
i will extend lazy_p to add suppourt for better g2p for japanese.

❯ cargo run --example lp_cli --release -- --input-text "空を見上げる" --lang ja
   Compiling lazy_phonememize v0.1.1-rc (/Users/aa/Documents/lazy_phonememize)
    Finished `release` profile [optimized] target(s) in 19.68s
     Running `target/release/examples/lp_cli --input-text '空を見上げる' --lang ja`
[DEBUG] [LazyPhonemizer] `lazy_p buffer len 48
INPUT_TEXT: 空を見上げる
PHONEMIZED: (en)tʃˈaɪniːz(ja)lˈe̞tə ˈo̞ (en)tʃˈa

some of the characters are fall back to either chinese or english phonemes.
do you have any suggestions for chinese g2p ?

@lucasjinreal
Copy link
Owner

Am new sure how did Kokoro original does, it uses miskai which written by author himself.

https://github.com/hexgrad/misaki

He seems uses espeak backend mainly, and language support for Chinese and Japanese had some modifications.

The essential alignment actually should be align lazy_g2p with misaki

@heabeounMKTO
Copy link
Author

Am new sure how did Kokoro original does, it uses miskai which written by author himself.

https://github.com/hexgrad/misaki

He seems uses espeak backend mainly, and language support for Chinese and Japanese had some modifications.

The essential alignment actually should be align lazy_g2p with misaki

ohh, im not aware of this one, i think i'll have a look and change lazy_p accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants