Hi,
I'm currently planning to adopt AudioVisual SlowFast Model and the only related code that I found was create_audio_visual_slowfast.py.
Since I'm quite new to this (I was using gluon cv, mxnet before), can you please guide me how to use your codes to finetune it with my own dataset? (I currently have videos and labels for these and willing to make my own audios from videos).
Thank you.