Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seed parameter while generating #6

Open
bstivic opened this issue Sep 27, 2020 · 5 comments
Open

Seed parameter while generating #6

bstivic opened this issue Sep 27, 2020 · 5 comments

Comments

@bstivic
Copy link

bstivic commented Sep 27, 2020

Hi,

I have problem with generating audio from the seed audio. As i understood, when we provide seed audio file, generation continues where after 64 samples of seed and it should point generation in some other directions than default? I am trying to seed because I get almost identical or very similar generated audio results (epoch 150, dataset 30min, 22050Hz) with different training parameters every time.

Error that I get when trying to seed:

!python generate.py
--output_path ./generated/default/test_1_t075_s10_16000.wav
--checkpoint_path ./logdir/default/26.09.2020_12.35.35/model.ckpt-140
--seed ./chunks/chunk_22050_mono_norm_chunk_109.wav
--dur 10
--sample_rate 16000
--temperature 0.75
--num_seqs 100
--config_file ./default.config.json

Traceback (most recent call last):
File "generate.py", line 225, in
main()
File "generate.py", line 221, in main
args.sample_rate, args.temperature, args.seed, args.seed_offset)
File "generate.py", line 188, in generate
init_samples[:, :model.big_frame_size, :] = quantize(seed_audio, q_type, q_levels)
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment

One more question: Is the sample rate differences maybe the problem? Does it have to be a same sample rate in training, generation and seed audio?

Best regards,
Branimir

@relativeflux
Copy link
Member

Ah yes, thanks for spotting this - the error looks like an issue caused by trying to assign to a tensor, which is not possible since they're immutable in TensorFlow... It was originally a Numpy array, and they are mutable. Apologies, I'll get this fixed.

Actually I'm not sure what would be the result of having different sample rates during training and generation. With regard to the effect of seeding, I suspect it needs a larger chunk of samples for it to have much effect... I need to look at that feature again, sorry haven't had any time to work on it.

@bstivic
Copy link
Author

bstivic commented Sep 28, 2020

Thank you for the super-fast reply!
Ok, i finally have some improvements after expanding dataset to ~4 hours, and results are sounding promising after just 35 epochs :). Do you have idea how training can be done on just one song ~4min, like generating results just in style of one song?

Its maybe possible that different sample rates can speed up or slow down generated audio (something similar happened in the training process with MelGan), so I switched to 16kHz on every parameter.

@relativeflux
Copy link
Member

relativeflux commented Sep 28, 2020

Thanks - yes, a 4hr dataset should yield something useful... 4 minutes unfortunately is unlikely to be practical, you're welcome to try it but my guess is that the model will simply overfit, meaning it will basically just memorize your dataset. Incidentally, in terms of getting a better training workflow we will shortly be releasing a model tuner/optimizer with the code... It's based on Keras Tuner, and there is already a branch available, although it is experimental and buggy at the moment, with no documentation on the tuner. I hope to merge this within the next week or so, I'm testing the implementation on some large datasets now. So instead of blindly picking some hyperparameters and hoping for the best, the tuner will allow users to find the optimal hyperparameters for a dataset, then proceed to a full training session with those hyperparams. Still likely to be more of an art than a science (which is not a bad thing!), but better than blindly stabbing in the dark!

@DigestContent0
Copy link

DigestContent0 commented Oct 24, 2020

Do you have idea how training can be done on just one song ~4min, like generating results just in style of one song?

An actual 4 minute-long file would not work and only produce meaningless noise. Perhaps repeating the audio until it reaches more than 25 minutes would work?

@relativeflux
Copy link
Member

relativeflux commented Oct 24, 2020

An actual 4 minute-long file would not work and only produce meaningless noise. Perhaps repeating the audio until it reaches more than 25 minutes would work?

@DigestContent0 @bstivic Indeed that would work, perhaps a better solution would be some kind of data augmentation (although I still suspect you'd need more raw data than a single 4-min track). This is often used when working with images. I have recently been investigating this very issue, with a view to including a data augmentation script in a future release. I've been experimenting with audiomentations, which looks promising.

Dadabots claim that they got good results from 3200 chunks (overlapped). Having worked with datasets of a few hundred chunks I can confirm that, whilst you might be able to achieve good training accuracy, validation accuracy indicates classic overfitting after a few epochs (that's on the validate branch, which I am hoping to merge to master very shortly).

I've added a gist for using audiomentations on a directory of wav files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants