Skip to content

Official repository of the IEEE SLT 2024 paper "NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization"

License

Notifications You must be signed in to change notification settings

ZhikangNiu/NDVQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NDVQ

Official repository of the IEEE SLT 2024 paper "NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization"

Important

I sincerely apologize; I haven't had much time recently to clean my code, so I'll be sharing some of the experimental dirty codebase I've been using.

We introduced Normal Distribution Vector Quantization (NDVQ), which innovatively applies a distribution-based approach to vector quantization in audio codec. NDVQ’s adoption of normal distributions within the codebook enhances robustness and generalization capability, leading to a better reconstructed audio quality, especially at extremely low bandwidths. Our comparative analysis with EnCodec demonstrates NDVQ’s superior performance in audio compression tasks and downstream codec-based speech synthesis tasks, confirming its potential as a more resilient alternative to traditional VQ methods. While the real-world audio environment encompasses speech, ambient sounds, and music, this investigation focused solely on speech, leaving other audio domains unexplored and thus limiting the applications. The challenge of developing a universal audio compression model based on our method represents a compelling avenue for future research.

NDVQ Architecture

image

Reconstruction Results at Various Bitrates

image

Install

Conda Install

conda create -n fairseq python=3.9 -y 
conda activate fairseq
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
pip install packaging editdistance gpustat wandb einops soundfile packaging librosa pandas

# install fairseq
git clone https://github.com/facebookresearch/fairseq.git
cd fairseq
git checkout 336c26a5
pip install --editable ./

# install apex
git clone https://github.com/NVIDIA/apex.git \
cd apex
git checkout 9263bc8 \
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

# clone repo
git clone https://github.com/ZhikangNiu/NDVQ.git
mv src/ema_gaussion_codec fairseq/examples

Docker

You can download image from dockerhub by the following code

docker pull zkniu/fairseq:torch1.12-cu113-fairseq

More details about this image, you can check our Dockerfile in this repo.

Training

Data Prepare

See examples/ema_gaussion_codec/scripts/wav2vec_manifest.py

Train

cd examples/ema_gaussion_codec/scripts/train
bash train.sh [CONFIG_NAME]

Inference & Evaluation

cd examples/ema_gaussion_codec/inference
bash compute_metrics.sh [ref_dir] [gen_dir] [bw]

Acknowledge

  1. We borrowed a lot of code from encodec
  2. We borrowed a lot of code from descript-audio-codec
  3. Thanks fairseq

Citation

Please cite the paper when referencing the NDVQ codebase and paper as:

@article{niu2024ndvq,
  title={NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization},
  author={Niu, Zhikang and Chen, Sanyuan and Zhou, Long and Ma, Ziyang and Chen, Xie and Liu, Shujie},
  journal={arXiv preprint arXiv:2409.12717},
  year={2024}
}

License

Our code is released under the MIT license

About

Official repository of the IEEE SLT 2024 paper "NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published