This repository offers the official implementation of InteractReID in PyTorch.
We introduce a novel interactive person retrieval frameworl for Sketch ReID. Inspired by CLIP's powerful cross-modal semantic alignment capabilities, Task-oriented Knowledge Adaptation is conducted for CLIP to achieve knowledge transfer from pre-trained CLIP to downstream Sketch ReID tasks. Afterwards, in order to achieve interactive sketch person retrieval with user's text feedback, based on the vision-text joint embedding space provided by CLIP, we aim to find a pseudo-word token that can accurately capture sketch's semantics, thus achieving explicit sketch-text compositionality for optimal composed semantic mining.
- All experiments are conducted on Nvidia GTX 4090 (24GB) GPUs.
- Python = 3.8
- The required packages are listed in
requirements.txt. You can install them using:
pip install -r requirements.txt-
Download CUHK-PEDES dataset from here, ICFG-PEDES dataset from here and RSTPReid dataset from here. Tri-PEDES is a combination of CUHK-PEDES, ICFG-PEDES, and RSTPReid.
-
Download the annotation json files from here.
-
Download the pretrained CLIP checkpoint from here and save it in path
checkpoint/.
-
CUHK-PEDES Organize them in your dataset folder as follows:
|-- dataset/ | |-- CUHK-PEDES/ | |-- imgs |-- cam_a |-- cam_b |-- ... | |-- train_reid.json | |-- test_reid.json | |-- val_reid.json |-- others/ -
ICFG-PEDES
Organize them in your dataset folder as follows:
|-- dataset/ | |-- ICFG-PEDES/ | |-- imgs |-- test |-- train | |-- train_reid.json | |-- test_reid.json | |-- val_reid.json |-- others/ -
RSTPReid
Organize them in your dataset folder as follows:
|-- dataset/ | |-- RSTPReid/ | |-- imgs | |-- train_reid.json | |-- test_reid.json | |-- val_reid.json |-- others/
In config/TriPEDES_pretrain.yaml, set the paths for dataset path and the CLIP checkpoint path.
You can start the the finetuning process of Task-oriented Knowledge Adaption by using the following command:
bash adaptation.shAfter knowledge adaptation for CLIP, you can train a vision-to-text converting network by using the following command:
bash tokenlearning.shIf you need to test your trained model directly, you can use the following command:
bash model_test.shIf you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:
@inproceedings{InteractReID,
author = {Xinyi, Wu and Cuiqun, Chen and Hui, Zeng and Zhiping, Cai and Bo, Du and Mang, Ye},
title = {Interactive Sketch-based Person Re-Identification with Text Feedback},
year = {2025},
booktitle = {International Conference on Multimedia and Expo (ICME)},
numpages = {9},
}
This code is distributed under an MIT LICENSE.
