-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Add F5 TTS pipeline #11958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add F5 TTS pipeline #11958
Conversation
Okay, got all the code which is needed in two files, and used existing diffusers primitives in some easy to catch places. Now will work on integrating it in the diffusers class structure |
Attention!Seems like we can use the diffusers Attention class directly, but need to add a new Processor to support RoPE embeds on selective heads as in F5 |
TokenizationF5 uses a character level tokenizer for the text, might want to write a simple tokeniser class for it. Might just be fine to keep it in a simple function for now, since its very straightforward. |
TestsBasic structure looks good now, let's add some tests, and then make it more diffusers friendly! Adding tests would also force me to follow the structure more strongly and ensure that the code is not buggy |
Flow matching/SchedulersWill also need to use one of the schedulers from Diffusers, I think they use simple Euler method only, but the sway sampling step needs to be accounted for somehow, although its just a change in the discretisation schedule so should be straightforward |
Future work
|
Current status
To do
|
Got the same forward passes as the OG F5! Next to write some tests |
Scheduler done! |
@asomoza I was writing some tests for this and was confused about why in the common test Same is true for some other tests too which set the generator_device to cpu |
Also any suggestions on how to add the character level tokenisation of F5, its just a simple character to index lookup, but not sure if to make a new tokeniser class for it, or just save it as a dict and load it somehow |
What does this PR do?
Add F5 TTS #10043