WaveFlow : A Compact Flow-based Model for Raw Audio

This is an unofficial PyTorch implementation of a paper "WaveFlow : A Compact Flow-based Model for Raw Audio".

Currently WIP. The implementation details may not be faithful.

Requirements

PyTorch 1.1.0 or later (tested on 1.3.0) & python 3.6 & Librosa

python preprocessing.py --in_dir /path/to/ljspeech/data/root --out_dir ./ljspeech_data

python train.py --model_name waveflow_h8_r64 --n_height 8 --res_channels 64 --n_layer_per_cycle 1

python train.py --model_name waveflow_h64_r64 --n_height 64 --res_channels 64 --n_layer_per_cycle 5

python train.py --model_name waveflow_h32_r128 --n_height 32 --res_channels 128 --n_layer_per_cycle 3

Specify --load_step and --num_samples that looks like:

python synthesize.py --model_name waveflow_h8_r64 --n_height 8 --res_channels 64 --n_layer_per_cycle 1 --load_step 100000 --num_samples 5