NDVQ

Official repository of the IEEE SLT 2024 paper "NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization"

Important

I sincerely apologize; I haven't had much time recently to clean my code, so I'll be sharing some of the experimental dirty codebase I've been using.

We introduced Normal Distribution Vector Quantization (NDVQ), which innovatively applies a distribution-based approach to vector quantization in audio codec. NDVQ’s adoption of normal distributions within the codebook enhances robustness and generalization capability, leading to a better reconstructed audio quality, especially at extremely low bandwidths. Our comparative analysis with EnCodec demonstrates NDVQ’s superior performance in audio compression tasks and downstream codec-based speech synthesis tasks, confirming its potential as a more resilient alternative to traditional VQ methods. While the real-world audio environment encompasses speech, ambient sounds, and music, this investigation focused solely on speech, leaving other audio domains unexplored and thus limiting the applications. The challenge of developing a universal audio compression model based on our method represents a compelling avenue for future research.

NDVQ Architecture

Reconstruction Results at Various Bitrates

Install

Conda Install

conda create -n fairseq python=3.9 -y 
conda activate fairseq
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
pip install packaging editdistance gpustat wandb einops soundfile packaging librosa pandas

# install fairseq
git clone https://github.com/facebookresearch/fairseq.git
cd fairseq
git checkout 336c26a5
pip install --editable ./

# install apex
git clone https://github.com/NVIDIA/apex.git \
cd apex
git checkout 9263bc8 \
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

# clone repo
git clone https://github.com/ZhikangNiu/NDVQ.git
mv src/ema_gaussion_codec fairseq/examples

Docker

You can download image from dockerhub by the following code

docker pull zkniu/fairseq:torch1.12-cu113-fairseq

More details about this image, you can check our Dockerfile in this repo.

Training

Data Prepare

See examples/ema_gaussion_codec/scripts/wav2vec_manifest.py

Train

cd examples/ema_gaussion_codec/scripts/train
bash train.sh [CONFIG_NAME]

Inference & Evaluation

cd examples/ema_gaussion_codec/inference
bash compute_metrics.sh [ref_dir] [gen_dir] [bw]

Acknowledge

We borrowed a lot of code from encodec
We borrowed a lot of code from descript-audio-codec
Thanks fairseq

Citation

Please cite the paper when referencing the NDVQ codebase and paper as:

@article{niu2024ndvq,
  title={NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization},
  author={Niu, Zhikang and Chen, Sanyuan and Zhou, Long and Ma, Ziyang and Chen, Xie and Liu, Shujie},
  journal={arXiv preprint arXiv:2409.12717},
  year={2024}
}

License

Our code is released under the MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/ema_gaussion_codec		src/ema_gaussion_codec
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NDVQ

NDVQ Architecture

Reconstruction Results at Various Bitrates

Install

Conda Install

Docker

Training

Data Prepare

Train

Inference & Evaluation

Acknowledge

Citation

License

About

Releases

Packages

Languages

License

ZhikangNiu/NDVQ

Folders and files

Latest commit

History

Repository files navigation

NDVQ

NDVQ Architecture

Reconstruction Results at Various Bitrates

Install

Conda Install

Docker

Training

Data Prepare

Train

Inference & Evaluation

Acknowledge

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages