This repository contains the code for the paper Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias, "LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
The demo of our method is available at huggingface spaces.
Setup the conda environment:
# Create conda environment
conda create -n lposs python=3.9
conda activate lposs
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
Install MMCV and MMSegmentation:
pip install -U openmim
mim install mmengine
mim install "mmcv-full==1.6.0"
mim install "mmsegmentation==0.27.0"
Install additional requirements:
conda install -c pytorch -c nvidia faiss-gpu=1.10.0
pip install kornia cupy-cuda11x
We use 8 benchmark datasets: PASCAL VOC20, PASCAL Context59, COCO-Object, PASCAL VOC, PASCAL Context, COCO-Stuff, Cityscapes, and ADE20k.
To run the evaluation, download and set up PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k datasets following "MMSegmentation" data preparation document.
COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by collecting instance segmentation annotations. Run the following command to convert instance segmentation annotations to semantic segmentation annotations:
python tools/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/
The provided code can be run using follwing commands:
LPOSS:
torchrun main_eval.py lposs.yaml --dataset {voc, coco_object, context, context59, coco_stuff, voc20, ade20k, cityscapes} [--measure_boundary]
LPOSS+:
torchrun main_eval.py lposs_plus.yaml --dataset {voc, coco_object, context, context59, coco_stuff, voc20, ade20k, cityscapes} [--measure_boundary]
@InProceedings{stojnic2025_lposs,
author = {Stojni\'c, Vladan and Kalantidis, Yannis and Matas, Ji\v{r}\'i and Tolias, Giorgos},
title = {LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}
This repository is based on "CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation". Thanks to the authors!