DART: Disentanglement of Accent and Speaker Representation in Multispeaker TTS

This repository contains the official demo code for DART, accepted at the Audio Imagination Workshop, NeurIPS 2024.

Audio samples

Paper

Overview

DART disentangles speaker identity and accent representation in multispeaker TTS using a structured latent framework.

Training

Train on L2-ARCTIC:

CUDA_VISIBLE_DEVICES=0 python train.py --dataset L2ARCTIC

Inference

Two synthesis scripts are provided:

synthesize_converted.py Generates speech across combinations of speakers, accents, and sentences.
synthesize_stats_valset.py
Generates speech from a metadata .txt file.

Required preprocessing

Before inference, extract embeddings:

python extract_stats.py

This saves MLVAE-based embeddings for speakers and accents.

Example

CUDA_VISIBLE_DEVICES=0 python synthesize_converted.py --dataset L2ARCTIC --restore_step 704000

Citation

If you find this model useful, please cite our paper:

@inproceedings{melechovsky2024dart,
  title={DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech},
  author={Melechovsky, J. and Mehrish, A. and Sisman, B. and Herremans, D.},
  booktitle={Audio Imagination Workshop, NeurIPS},
  year={2024}
}

Acknowledgements

Based on Comprehensive Transformer TTS by Keon Lee et al.

Contact

Open an issue for questions or collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Metrics		Metrics
audio		audio
config		config
deepspeaker		deepspeaker
encoder		encoder
ge2e		ge2e
lexicon		lexicon
model		model
preprocessor		preprocessor
samples		samples
text		text
utils		utils
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
extract_stats.py		extract_stats.py
index.html		index.html
object_metricsL2.py		object_metricsL2.py
plot_embs.py		plot_embs.py
prepare_align.py		prepare_align.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize_converted.py		synthesize_converted.py
synthesize_stats_valset.py		synthesize_stats_valset.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DART: Disentanglement of Accent and Speaker Representation in Multispeaker TTS

Overview

Training

Inference

Required preprocessing

Example

Citation

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DART: Disentanglement of Accent and Speaker Representation in Multispeaker TTS

Overview

Training

Inference

Required preprocessing

Example

Citation

Acknowledgements

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages