-
Notifications
You must be signed in to change notification settings - Fork 49
Adding support for Windows Sapi5 implimentation #40
Comments
Hi @king-dahmanus, thanks for your feedback! I would definitely be interested in adding SAPI5 support for Windows in order to make Larynx more accessible to everyone. I'll have to look into what it would take in implement a TTS engine interface. I've experimented with getting the voices much more responsive in my Glow Speak project, which runs a daemon and caches all of the WAV files it produces (it also uses eSpeak to turn text into phonemes). As you mentioned, though, there are weird artifacts for short phrases, especially single words. I believe this is largely a problem with the datasets I have; none of them feature single word utterances, and many of them have sentences split across multiple utterances (so no pauses at the beginning or end). Do you know of any public audio datasets that contain only complete sentences and single spoken words? If not, would you be interested in collaborating to create one? |
Hi their. So, about the public datasets, I do not know of anything that
exists currently. As for creating one, how would I go about
collaborating with you to create the needed datasets? Thanks in advance
…On Sun, 14 Nov 2021 at 17:13, Michael Hansen ***@***.***> wrote:
Hi @king-dahmanus <https://github.com/king-dahmanus>, thanks for your
feedback! I would definitely be interested in adding SAPI5 support for
Windows in order to make Larynx more accessible to everyone. I'll have to
look into what it would take in implement a TTS engine interface.
I've experimented with getting the voices much more responsive in my Glow
Speak <https://github.com/rhasspy/glow-speak> project, which runs a
daemon and caches all of the WAV files it produces (it also uses eSpeak to
turn text into phonemes). As you mentioned, though, there are weird
artifacts for short phrases, especially single words. I believe this is
largely a problem with the datasets I have; none of them feature single
word utterances, and many of them have sentences split across multiple
utterances (so no pauses at the beginning or end).
Do you know of any public audio datasets that contain only complete
sentences and single spoken words? If not, would you be interested in
collaborating to create one?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2FKJSCIYTFZ56BVFUOWGLUL7NYZANCNFSM5H77FMCQ>
.
|
Do you, or anyone you know, have a pleasant voice, a good microphone, and a lot of patience? 🙂 I've worked with several people to create text to speech datasets. I use an algorithm to select a (relatively) small set of phonetically diverse sentences from a public domain book or corpus. Here, I would also make sure that we have a diversity of single spoken words. |
well, I do have a teen voice, and a good quality microphone with some
background static noise, and their's nothing there to fix it. But anyway,
If you want, Give me a txt file containing the words or sentences I should
speak, and I'll make recordings for them and clean them to the best of m
ability. Oh also tell me the prefered format of the audio files, and I'll
make an archive that has labeled file names of all the sentences and words
spoken in it. Thanks!
…On Wed, 17 Nov 2021 at 03:56, Michael Hansen ***@***.***> wrote:
Do you, or anyone you know, have a pleasant voice, a good microphone, and
a lot of patience? 🙂
I've worked with several people to create text to speech datasets. I use
an algorithm to select a (relatively) small set of phonetically diverse
sentences from a public domain book or corpus. Here, I would also make sure
that we have a diversity of single spoken words.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2FKJSUCLMXW4A3GDUJVDDUMMKXLANCNFSM5H77FMCQ>
.
|
hey, here's a multilanguage dataset I found, It's commonvoice, It claims to
me the largest dataset of its kind, Check it out at
https://commonvoice.mozilla.org/en/datasets
On Wed, 17 Nov 2021 at 13:28, blind zigzigon ***@***.***>
wrote:
… well, I do have a teen voice, and a good quality microphone with some
background static noise, and their's nothing there to fix it. But anyway,
If you want, Give me a txt file containing the words or sentences I should
speak, and I'll make recordings for them and clean them to the best of m
ability. Oh also tell me the prefered format of the audio files, and I'll
make an archive that has labeled file names of all the sentences and words
spoken in it. Thanks!
On Wed, 17 Nov 2021 at 03:56, Michael Hansen ***@***.***>
wrote:
> Do you, or anyone you know, have a pleasant voice, a good microphone, and
> a lot of patience? 🙂
>
> I've worked with several people to create text to speech datasets. I use
> an algorithm to select a (relatively) small set of phonetically diverse
> sentences from a public domain book or corpus. Here, I would also make sure
> that we have a diversity of single spoken words.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#40 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AT2FKJSUCLMXW4A3GDUJVDDUMMKXLANCNFSM5H77FMCQ>
> .
>
|
The Common Voice datasets are excellent, but not ideal for a text to speech voice. For text to speech, you want a lot of high quality data from very few speakers (no noise, if possible). For speech to text, however, Common Voice is great -- lots of noisy data from many speakers. Let me look around a bit more before asking you to do any recording. A lot of the text to speech datasets are derived from LibriVox, and I'm hoping there will be a book there where the author reads out lists of items so we can get isolated spoken words. |
Right, Good luck! I look forward to it, and as I said, if you need anything
for me to record please let me know!
…On Fri, 19 Nov 2021 at 17:38, Michael Hansen ***@***.***> wrote:
The Common Voice datasets are excellent, but not ideal for a text to
speech voice. For text to speech, you want a lot of high quality data from
very few speakers (no noise, if possible). For *speech to text*, however,
Common Voice is great -- lots of noisy data from many speakers.
Let me look around a bit more before asking you to do any recording. A lot
of the text to speech datasets are derived from LibriVox, and I'm hoping
there will be a book there where the author reads out lists of items so we
can get isolated spoken words.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2FKJW25YNAWBNOCINVT73UMZ4RHANCNFSM5H77FMCQ>
.
|
Hey, What's new? Are you working on something yet michael? I mean to tell you something. Currently, we could ignore the dataset issue for the moment and concentrate on making this thing support sapi5 on windows. And also, the speed I'm talking about isn't the issue of not being able to pronounce words with the right intonation, but rather being able to speak at very fast speech rates without producing weird artifacts, and also being responsive, so it doesn't have any lag or delay while speaking, so it has to be fast and responsive. Maybe this is already accomplished since it's designed for rasberry py, but still. Thanks, and have a good time |
Hey there developers! I found this repo by exploring, and I'd like to make some requests.
Firstly: Releasing a windows sapi5 version of the tts engine, compatible with all the voices that are available, with integrated necessary encoders which ensure a fast and responsive synthesis: Details below.
I am a blind person who uses a screen reader to use the computer. Blind people like me require a responsive speech synthesizer so they can recieve the requested information without any unnecessary delays, and a quite poppular part of them require very fast speech output without resulting in weird voice artifacts such as those produced by natural sounding tts voices. If I were stupid and ignorant to the point where I don't realize the hard work for it, I would ask you to make an Nvda addon containing the synthesizer along with a possibility to download the voices, but a more mainstream windows integrated option like sapi5 would maybe a little easier perhaps?
Anyway, I know that this project is for rasberry py/commandline usage, but the currently available voices attracted someone like me who uses a more beneficial option for say, dayly usage or something. I look forward to your responce, This is just a request from me, if it can't be done it can't be done. So thanks, and have a good time
The text was updated successfully, but these errors were encountered: