Name	Name	Last commit message	Last commit date
Latest commit yaotingwangofficial Update README.md Dec 4, 2024 82eb688 · Dec 4, 2024 History 17 Commits
assets	assets	Add files via upload	Jul 14, 2024
configs	configs	Add files via upload	Jul 15, 2024
data	data	init	Jul 14, 2024
datasets	datasets	init	Jul 14, 2024
logs	logs	init	Jul 14, 2024
models	models	Add files via upload	Jul 15, 2024
scripts	scripts	init	Jul 14, 2024
utils	utils	init	Jul 14, 2024
LICENSE	LICENSE	Initial commit	Jul 2, 2024
README.md	README.md	Update README.md	Dec 4, 2024
__init__.py	__init__.py	init	Jul 14, 2024
requirements.txt	requirements.txt	init	Jul 14, 2024
run.sh	run.sh	init	Jul 14, 2024
run_refavs.py	run_refavs.py	init	Jul 14, 2024

Name

Last commit message

Last commit date

yaotingwangofficial

Update README.md

Dec 4, 2024

82eb688 · Dec 4, 2024

Jul 14, 2024

Jul 15, 2024

Jul 14, 2024

Jul 14, 2024

Jul 14, 2024

Jul 15, 2024

Jul 14, 2024

Jul 14, 2024

Jul 2, 2024

Dec 4, 2024

Jul 14, 2024

Jul 14, 2024

Jul 14, 2024

Jul 14, 2024

Ref-AVS

The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024

>>> Introduction

In this paper, we propose a pixel-level segmentation task called Referring Audio-Visual Segmentation (Ref-AVS), which requires the network to densely predict whether each pixel corresponds to the given multimodal-cue expression, including dynamic audio-visual information.

Top-left of Fig.1 highlights the distinctions between Ref-AVS and previous tasks.
Fig.2 shows the proposed baseline model to process multimodal-cues.
Fig.3 shows the statistics of this dataset.

>>> Run

Run the training & evaluation:

cd Ref_AVS
sh run.sh  # you should change your path configs. See /configs/config.py for more details.

You can download the checkpoint here.

Core dependencies:

transformers=4.30.2
towhee=1.1.3
towhee-models=1.1.3  # Towhee is used for extracting VGGish audio feature.

>>> FAQ

(1) Alternative Audio Feature Extraction

If you found the towhee is hard to establish, please consider using the following code with Google CoLab: link.

Citation

If you find this work useful, please consider citing it:

@article{wang2024refavs,
  title={Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes},
  author={Wang, Yaoting and Sun, Peiwen and Zhou, Dongzhan and Li, Guangyao and Zhang, Honggang and Hu, Di},
  journal={IEEE European Conference on Computer Vision (ECCV)},
  year={2024},
}

@inproceedings{wang2024prompting,
  title={Prompting segmentation with sound is generalizable audio-visual source localizer},
  author={Wang, Yaoting and Liu, Weisong and Li, Guangyao and Ding, Jian and Hu, Di and Li, Xi},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={6},
  pages={5669--5677},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ref-AVS

Project Page

Dataset Download

>>> Introduction

>>> Run

>>> FAQ

(1) Alternative Audio Feature Extraction

Citation

About

Releases

Packages

Languages

License

GeWu-Lab/Ref-AVS

Folders and files

Latest commit

History

Repository files navigation

Ref-AVS

Project Page

Dataset Download

>>> Introduction

>>> Run

>>> FAQ

(1) Alternative Audio Feature Extraction

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages