Skip to content

LAMDASZ-ML/BMIP

Repository files navigation

BMIP: Bi-directional Modality Interaction Prompt Learning for VLM [IJCAI 2025]

BMIP: Bi-directional Modality Interaction Prompt Learning for VLM
Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo

Official implementation of the paper "BMIP: Bi-directional Modality Interaction Prompt Learning for VLM".


🚀 News


Highlights

main figure

Abstract: Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks. However, existing studies mainly focus on single-modal prompts or uni-directional modality interaction, overlooking the powerful alignment effects resulting from the interaction between the vision and language modalities. To this end, we propose a novel prompt learning method called Bi-directional Modality Interaction Prompt (BMIP), which dynamically weights bi-modal information through learning the information of the attention layer, enhancing trainability and inter-modal consistency compared to simple information aggregation methods. To evaluate the effectiveness of prompt learning methods, we propose a more realistic evaluation paradigm called open-world generalization complementing the widely adopted cross-dataset transfer and domain generalization tasks. Comprehensive experiments on various datasets reveal that BMIP not only outperforms current state-of-the-art methods across all three evaluation paradigms but is also flexible enough to be combined with other prompt-based methods for consistent performance enhancement.

Main Contributions

  1. Novel Bi-directional Modality Interaction Technique
    • Enhance the cross-modality alignment and pave the way for further exploration of information aggregation in other multi-modal modelsNew
  2. Evaluation Paradigm: Open-World Generalization
    • Facilitate more realistic evaluations and promote related research
  3. Flexible Integration with Other Methods
    • BMIP is flexible enough to combine with other prompt learning methods, consistently boosting their performance.
  4. State-of-the-Art Performance
    • BMIP achieves SOTA performance across all tasks

☑️ Supported Methods

Method Paper Configs Training Scripts
BMIP IJCAI 2025 link link
MaPLe CVPR 2023 link link
CoOp IJCV 2022 link link
Co-CoOp CVPR 2022 link link
Deep Vision Prompting - link link
Deep Language Prompting - link link
Independent V-L Prompting - link link

Results

BMIP in comparison with existing methods

Results reported below show accuracy for base and novel classes for across 11 recognition datasets averaged over 3 seeds.

Name Base Acc. Novel Acc. HM Epochs
CLIP 69.34 74.22 71.70 -
CoOp 82.69 63.22 71.66 200
CoCoOp 80.47 71.69 75.83 10
MaPLe 82.28 75.14 78.55 5
BMIP 83.47 76.69 79.04 10

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.


Citation

If you use our work, please consider citing:

@misc{lv2025bmipbidirectionalmodalityinteraction,
      title={BMIP: Bi-directional Modality Interaction Prompt Learning for VLM}, 
      author={Song-Lin Lv and Yu-Yang Chen and Zhi Zhou and Ming Yang and Lan-Zhe Guo},
      year={2025},
      eprint={2501.07769},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2501.07769}, 
}

Contact

If you have any questions, please create an issue on this repository or contact at [email protected].

Acknowledgements

Our code is based on Co-CoOp and CoOp repository. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published