Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", accepted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Training and Testing

Demo

Demo on EMO-DB dataset:

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset, cost function, and input length(second).

Note 1: The input is automatically cut or padded to the desired size and stored in the data folder.
Note 2: The best model are saved in the result folder.
Note 3: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name} \
                  -id {input durations} \
                  -at {audio_type} \
                  -ln {cost function name} \
                  -v {verbose for training bar} \
                  -it {type of input(mfcc, spectrogram, mel_spectrogram)}

Example:

EMO-DB Dataset:

python train.py -dn "EMO-DB" \
                -id 3 \
                -at "all" \
                -ln "focal" \
                -v 1 \
                -it "mfcc"

IEMOCAP Dataset:

python train.py -dn "IEMOCAP" \
                -id 7 \
                -at "impro" \
                -ln "cross_entropy" \
                -v 1 \
                -it "mfcc"

Note : For all experiments just run run.sh

sh run.sh

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Single File Inference

In the folder inference_tests are present:

trained model (named: EMO-DB_3.0s_Segmented_cross_entropy_float32) with the relative confusion matrix and report for the performances. The model is trained using the EMO-DB dataset;
some audio samples named with the corresponding label.

Content of LIGHT-SERNET/inference_tests folder

Except for the four files named

EMO-DB_3.0s_Segmented_cross_entropy_float32.tflite
EMO-DB_3.0s_Segmented_cross_entropy_Report.txt
EMO-DB_3.0s_Segmented_cross_entropy_TotalConfusionMatrixNormalized.pdf
EMO-DB_3.0s_Segmented_cross_entropy_TotalConfusionMatrix.pdf

the other files in the folder contain in the name some information like

The label of the audio file
Whether the file belongs to the EMO-DB Dataset (simply named with dataset)
Whether the file is a phrase from the dataset but it has been recorded by me from a german speaker (simply named with rec)
Whether the file is NOT a phrase from the dataset and it has been recorded by me from a german speaker (simply named with external)

! ! ! PLEASE CAREFUL : audio tagged with rec and external are recorded by me with non-professional instruments with two german speakers that are NOT ACTRESSES, so the classification of the content may encounter some impediments.

Run Inference Experiments

Colab Notebook:

The main functions used for the experiments, took from the original source and modified a little for my scopes, are contained into the file inference_data_processing.py

Install the requirements
Run the file inference_single_file.py with

$ python inference_single_file.py -id 3 -at "all" -ln "cross_entropy" -it "mfcc" -fn "happiness_dataset.wav"

where "happiness_dataset.wav" is the name of the file you want to classify, stored in the folder inference_tests;

CAREFUL : the file must be a .wav file sampled at 16 kHz, otherwise the program will not work.

The used model

The used model is also present in the folder inference_tests with the name EMO-DB_3.0s_Segmented_cross_entropy_float32.tflite.

To change the model train again the Neural Network and then modify the model path in the variable model_path in the code in inference_single_file.py.

! ! ! PLEASE CAREFUL: if you train a new model, the classes will be mixed, and so you'll have to reassign the correct correspondend classes in the dictionary.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
Report		Report
article_results/Confusion_Matrices		article_results/Confusion_Matrices
data		data
inference_tests		inference_tests
model		model
pics		pics
result		result
utils		utils
.gitignore		.gitignore
Demo_Light_SERNet.ipynb		Demo_Light_SERNet.ipynb
Light_SERNet_inference_tests.ipynb		Light_SERNet_inference_tests.ipynb
README.md		README.md
callbacks.py		callbacks.py
dataio.py		dataio.py
filter_dataset.py		filter_dataset.py
hyperparameters.py		hyperparameters.py
inference_data_processing.py		inference_data_processing.py
inference_single_file.py		inference_single_file.py
loss.py		loss.py
model_saver.py		model_saver.py
models.py		models.py
requirements.txt		requirements.txt
run.sh		run.sh
tflite_evaluate.py		tflite_evaluate.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Light-SERNet

Training and Testing

Demo

Run

1. Clone Repository

2. Requirements

3. Data:

4. Set hyperparameters and training config :

6. Strat training:

Example:

Citation

Single File Inference

Content of LIGHT-SERNET/inference_tests folder

Run Inference Experiments

The used model

About

Releases

Packages

Languages

PanK0/LIGHT-SERNET

Folders and files

Latest commit

History

Repository files navigation

Light-SERNet

Training and Testing

Demo

Run

1. Clone Repository

2. Requirements

3. Data:

4. Set hyperparameters and training config :

6. Strat training:

Example:

Citation

Single File Inference

Content of LIGHT-SERNET/inference_tests folder

Run Inference Experiments

The used model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages