1 Inria    2 Valeo.ai    3 Kyutai
TL;DR: CLIP projects the visual embeddings to the shared latent space using a linear projection layer. ProLIP simply fine-tunes this layer with a zero-shot regularization loss. ProLIP is a strong alternative to linear probing, prompt tuning and CLIP-adapters, and is robust to the learning rate. It also significantely outperforms prompt tuning on test-time adaptation.
- Test-time adaptation code.
- Installation
- Running ProLIP
- Saving Pre-projection Features
- Few-shot Classification with Few-shot Validation
- ProLIP sensitivity to hyperparameters
- Few-shot classification without validation set
- Cross-dataset Generalization
- Domain Generalization
- Base-to-new Generalization
- Full FT and Last-layer FT
- Complementarity to other methods
- Regularized Linear Adapter
- ProLIP Text
- Average Accuracy
- Test-time Prolip
- Acknowledgement
- Citation
Create a conda environment and install dependencies:
git clone https://github.com/astra-vision/ProLIP.git
cd ProLIP
conda create -n prolip python=3.8
conda activate prolip
# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit
Follow DATASET.md to install the datasets.
Before running ProLIP, make sure to change root_path
in the configuration files at configs/experiments
to the path of the datasets.
First, start by saving the pre-projection features; i.e. the features to which the linear projection of CLIP is applied.
bash scripts/save_features.sh
You can change the backbone in save_features.yaml
if you want its corresponding features.
From now on, you can directly train the projection layer using these features, requiring up to 2 seconds per training.
To obtain the results of few-shot classification with a few-shot validation set for hyperparameter selection (which corresponds to the setting of LP++
), please run:
bash scripts/few_shot_few_val.sh
These experiments correspond to Table 1, Table 20 and Table 21 of the paper.
To obtain the results of few-shot classification for different combinations of hyperparameters (i.e. Learning rate (lr) and
bash scripts/few_shot_no_val_lr1e-6_lambda_1.sh
bash scripts/few_shot_no_val_lr1e-6_lambda_1e-1.sh
bash scripts/few_shot_no_val_lr1e-6_lambda_1e-2.sh
bash scripts/few_shot_no_val_lr1e-6_lambda_0.sh
bash scripts/few_shot_no_val_lr1e-5_lambda_1.sh
bash scripts/few_shot_no_val_lr1e-5_lambda_1e-1.sh
bash scripts/few_shot_no_val_lr1e-5_lambda_1e-2.sh
bash scripts/few_shot_no_val_lr1e-5_lambda_0.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_1.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_1e-1.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_1e-2.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_0.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_1.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_1e-1.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_1e-2.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_0.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_1.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_1e-1.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_1e-2.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_0.sh
Note that each of these lines corresponds to a specific combination of lr and bash scripts/few_shot_no_val_lr1e-3_lambda_1e-1.sh
runs the training for lr=0.001 and
Using these commands, you can obtain the results of Figure 3 and Table 10 of the paper.
The results show two interesting observations: 1) Using regularization
Thus,
To obtain the results of few-shot classification without validation set; for different learning rates (lr) and
bash scripts/few_shot_no_val_lr1e-5_lambda_1_N.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_1_N.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_1_N.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_1_N.sh
To obtain the results of the same validation-free setting using
bash scripts/few_shot_no_val_lr1e-5_lambda_1_N2.sh
bash scripts/few_shot_no_val_lr1e-4_lambda_1_N2.sh
bash scripts/few_shot_no_val_lr1e-3_lambda_1_N2.sh
bash scripts/few_shot_no_val_lr1e-2_lambda_1_N2.sh
Using these commands, you can obtain the results of Table 2 and Table 11 of the paper.
When using
To obtain the results of cross-dataset generalization, i.e. training on 4-shot imagenet in the validation-free setting and testing on the other 10 datasets, first train using:
bash scripts/train_4_shot_imagenet.sh
(Note: No need to run the script above if you have already run bash scripts/few_shot_no_val_lr1e-5_lambda_1_N.sh
and have set save_checkpoints
to True
in configs/experiments/few_shot_no_val_lr1e-5_lambda_1_N.yaml
)
Then test on the other datasets:
bash scripts/cross_dataset.sh
Using these commands, you can obtain the results of Table 3 of the paper.
To obtain the results of domain generalization; i.e. training on 4-shot imagenet and testing on ImageNet-A, ImageNet-R, Imagenet-V2 and Imagenet-Sketch, first save the features corresponding to each backbone:
bash scripts/save_features_IN_RN50.sh
bash scripts/save_features_IN_RN101.sh
bash scripts/save_features_IN_VITB16.sh
bash scripts/save_features_IN_VITB32.sh
(Note: No need to run bash scripts/save_features_IN_RN50.sh
if you have already run bash scripts/save_features.sh
)
Then train the models using:
bash scripts/train_4_shot_imagenet.sh
bash scripts/DG_RN101.sh
bash scripts/DG_VITB16.sh
bash scripts/DG_VITB32.sh
(Note: No need to run bash scripts/train_4_shot_imagenet.sh
if you have already run bash scripts/few_shot_no_val_lr1e-5_lambda_1_N.sh
and have set save_checkpoints
to True
in configs/experiments/few_shot_no_val_lr1e-5_lambda_1_N.yaml
)
This will automatically output the results on the out-of-distribution variants. Using these commands, you can obtain the results of Table 4 of the paper.
To obtain the results of base-to-new generalization for 4-shot training on RN50, first train on base classes in the validation-free setting:
bash scripts/base_to_new_RN50.sh
This will train ProLIP on the base classes and test on the corresponding test sets.
To evaluate the trained models on new classes, please run:
bash scripts/new_classes_eval_RN50.sh
To obtain the results of base-to-new generalization for 16-shot training on ViT-B/16, first save the features if it has not already been done:
bash scripts/save_features_16_shot_VITB16.sh
Then train on base classes in the validation-free setting:
bash scripts/base_to_new_VITB16.sh
Finally evaluate the trained models on new classes:
bash scripts/new_classes_eval_VITB16.sh
Using these commands, you can obtain the results of Table 5, Table 14 and Table 15 of the paper.
To obtain the results of Full fine-tuning of the vision encoder, please run:
bash scripts/full_ft.sh
To obtain the results of Last-layer fine-tuning, please run:
bash scripts/last_layer_ft_lr1e-5.sh
bash scripts/last_layer_ft_lr1e-4.sh
Using these commands, you can obtain the results of Table 6 of the paper.
To obtain the results of combining the logits of ProLIP and those of Tip-Adapter-F
in the validation-free setting, please run:
bash scripts/prolip_tip.sh
To obtain the results of combining the logits of ProLIP and those of TaskRes
in the validation-free setting, please run:
bash scripts/prolip_taskres.sh
Using these commands, you can obtain the results of Table 7 of the paper.
To obtain the results of CLIP-adapter with different learning rate (lr) and residual ratio (
bash scripts/CA_lr1e-5_alpha_0.sh
bash scripts/CA_lr1e-4_alpha_0.sh
bash scripts/CA_lr1e-3_alpha_0.sh
bash scripts/CA_lr1e-2_alpha_0.sh
bash scripts/CA_lr1e-5_alpha_1e-1.sh
bash scripts/CA_lr1e-4_alpha_1e-1.sh
bash scripts/CA_lr1e-3_alpha_1e-1.sh
bash scripts/CA_lr1e-2_alpha_1e-1.sh
bash scripts/CA_lr1e-5_alpha_3e-1.sh
bash scripts/CA_lr1e-4_alpha_3e-1.sh
bash scripts/CA_lr1e-3_alpha_3e-1.sh
bash scripts/CA_lr1e-2_alpha_3e-1.sh
bash scripts/CA_lr1e-5_alpha_5e-1.sh
bash scripts/CA_lr1e-4_alpha_5e-1.sh
bash scripts/CA_lr1e-3_alpha_5e-1.sh
bash scripts/CA_lr1e-2_alpha_5e-1.sh
bash scripts/CA_lr1e-5_alpha_7e-1.sh
bash scripts/CA_lr1e-4_alpha_7e-1.sh
bash scripts/CA_lr1e-3_alpha_7e-1.sh
bash scripts/CA_lr1e-2_alpha_7e-1.sh
bash scripts/CA_lr1e-5_alpha_9e-1.sh
bash scripts/CA_lr1e-4_alpha_9e-1.sh
bash scripts/CA_lr1e-3_alpha_9e-1.sh
bash scripts/CA_lr1e-2_alpha_9e-1.sh
To obtain the results of our proposed regularized linear adapter in the validation-free setting using
bash scripts/RLA_lr1e-5_lambda_1_N.sh
bash scripts/RLA_lr1e-4_lambda_1_N.sh
bash scripts/RLA_lr1e-3_lambda_1_N.sh
bash scripts/RLA_lr1e-2_lambda_1_N.sh
Using these commands, you can obtain the results of Figure 4 and Table 16 of the paper.
The results help rethinking CLIP-adapter from the perspective of ProLIP, and show that the regularized linear adapter 1) outperforms the classical non-linear adapter, 2) alleviates the need of architecture design and hyperparameter selection, and 3) exhibits stable performance across different learning rates.
To obtain the results of ProLIP applied to the text embedding projection layer in the validation-free setting, for different learning rate and with
bash scripts/prolip_text_lr1e-5_lambda_1_N.sh
bash scripts/prolip_text_lr1e-4_lambda_1_N.sh
bash scripts/prolip_text_lr1e-3_lambda_1_N.sh
bash scripts/prolip_text_lr1e-2_lambda_1_N.sh
Using these commands, you can obtain the results of Table 8 and Table 13 of the paper.
Running any of the trainings above will output .txt
files containing the average accuracy over 10 seeds per dataset, and per value of N (i.e. Number of shots). To compute the average accuracy over the 11 datasets, please run:
python results/mean_std.py --acc_path <path_to_the_accuracy_values>
đ¨ Todo: I will add the code soon.
This work was partially funded by French project SIGHT (ANR-20-CE23-0016). It was performed using HPC resources from GENCIâIDRIS (Grants AD011014477R1, AD011012808R3). The authors thank ClĂŠment Weinreich for insightful discussion.
This repository is built on top of LP++
, Tip-adapter
, and CLIP
. Thanks to the authors for making their work open-source!
@misc{fahes2025clipsvisualembeddingprojector,
title={CLIP's Visual Embedding Projector is a Few-shot Cornucopia},
author={Fahes, Mohammad and Vu, Tuan-Hung and Bursuc, Andrei and P\'erez, Patrick and de Charette, Raoul},
year={2025},
eprint={2410.05270},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.05270},
}