GitHub - whuhxb/Change3D: The official code of Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective.

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

A simple and efficient framework for change detection and captioning tasks.

Duowang Zhu¹, Xiaohu Huang², Haiyan Huang¹, Hao Zhou³, and Zhenfeng Shao^1*

¹ Wuhan University ² The University of Hong Kong ³ Bytedance

✨ Highlights

Unified Framework: Supports multiple change detection and captioning tasks.
Highly Efficient: Uses ~6–13% of the parameters and ~8–34% of the FLOPs compared to SOTA.
SOTA Performance: Achieves SOTA performance without complex structures, offering an alternative to 2D models.

📰 News

[2025.03.25] We have released all the training codes of Change3D!
[2025.02.27] Change3D has been accepted by CVPR 2025! 🎉🎉

📄 Abstract

We present Change3D, a unified video-based framework for change detection and captioning. Unlike traditional methods that use separate image encoders and multiple change extractors, Change3D treats bi-temporal images as a short video with learnable perception frames. A video encoder enables direct interaction and difference detection, simplifying the architecture. Our approach supports various tasks, including binary change detection (BCD), semantic change detection (SCD), building damage assessment (BDA), and change captioning (CC). Evaluated on eight benchmarks, Change3D outperforms SOTA methods while using only ~6%–13% of the parameters and ~8%–34% of the FLOPs.

🎮 Framework

Figure 1. Overall architectures of Change3D for Binary Change Detection, Semantic Change Detection, Building Damage Assessment, and Change Captioning.

📝 Performance

We conduct extensive experiments on eight public datasets: LEVIR-CD, WHU-CD, CLCD, HRSCD, SECOND, xBD, LEVIR-CC, and DUBAI-CC.

🎯 How to Use

Installation

conda create -n Change3D python=3.11.0
conda activate Change3D
pip install -r requirements.txt

Pretrained Weight

Download the X3D-L weight and put it into the root directory.

Data Preparation

For BCD: Download LEVIR-CD, WHU-CD and CLCD datasets. Prepare the dataset into the following structure and crop each image into 256x256 patches.

    ├─Train
        ├─t1          jpg/png (input image of T1)
        ├─t2          jpg/png (input image of T2)
        └─label       jpg/png (binary change mask)
    ├─Val
        ├─t1 
        ├─t2
        └─label
    ├─Test
        ├─t1
        ├─t2
        └─label

For SCD: Download HRSCD and SECOND datasets. Prepare the dataset into the following structure and crop each image into 256x256 patches.

    ├─Train
        ├─t1          jpg/png  (input image of T1)
        ├─t2          jpg/png  (input image of T2)
        ├─label1      jpg/png  (semantic mask of T1)
        ├─label2      jpg/png  (semantic mask of T2)
        └─change      jpg/png  (binary change mask)
    ...

    ├─Test
        ├─t1
        ├─t2
        ├─label1
        ├─label2
        └─change

For BDA: Download xBD dataset. Prepare the dataset into the following structure and crop each image into 256x256 patches.

    ├─Train
        ├─t1          jpg/png  (input image of T1)
        ├─t2          jpg/png  (input image of T2)
        ├─label1      jpg/png  (damage localization mask)
        └─label2      jpg/png  (damage level mask)
    ...

    ├─Test
        ├─t1
        ├─t2
        ├─label1
        └─label2

For CC: Download LEVIR-CC and DUBAI-CC datasets. Then follow the practice introduced in RSICCformer.

🎮 Train & Evalaute the Models

Training binary change detection with LEVIR-CD dataset as an example:

python ./scripts/train_BCD.py --dataset LEVIR-CD
                              --file_root path/to/LEVIR-CD
                              --pretrained path/to/X3D_L.pyth
                              --save_dir ./exp
                              --gpu_id 0

Note: The above train script completes the evaluation automatically.

❤️ Acknowledgements

This repository is mainly built upon pytorchvideo and RSICCformer. Thanks for those well-organized codebases.

📧 Contact

If you have any issues while using the project, please feel free to contact me: [email protected].

📜 License

Change3D is released under the CC BY-NC-SA 4.0 license.

📚 Citation

If you find our work useful, please consider citing our paper:

@inproceedings{zhu2025change3d,
  title={Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective},
  author={Zhu, Duowang and Huang, Xiaohu and Huang, Haiyan and Zhou, Hao and Shao, Zhenfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
data		data
eval_func		eval_func
model		model
scripts		scripts
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

✨ Highlights

📰 News

📄 Abstract

🎮 Framework

📝 Performance

🎯 How to Use

Installation

Pretrained Weight

Data Preparation

🎮 Train & Evalaute the Models

❤️ Acknowledgements

📧 Contact

📜 License

📚 Citation

About

Releases

Packages

Languages

whuhxb/Change3D

Folders and files

Latest commit

History

Repository files navigation

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

✨ Highlights

📰 News

📄 Abstract

🎮 Framework

📝 Performance

🎯 How to Use

Installation

Pretrained Weight

Data Preparation

🎮 Train & Evalaute the Models

❤️ Acknowledgements

📧 Contact

📜 License

📚 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages