Skip to content

The official code of Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective.

Notifications You must be signed in to change notification settings

whuhxb/Change3D

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

A simple and efficient framework for change detection and captioning tasks.

Duowang Zhu1, Xiaohu Huang2, Haiyan Huang1, Hao Zhou3, and Zhenfeng Shao1*

1 Wuhan University   2 The University of Hong Kong   3 Bytedance

Visualization

✨ Highlights

  • Unified Framework: Supports multiple change detection and captioning tasks.
  • Highly Efficient: Uses ~6–13% of the parameters and ~8–34% of the FLOPs compared to SOTA.
  • SOTA Performance: Achieves SOTA performance without complex structures, offering an alternative to 2D models.

📰 News

  • [2025.03.25] We have released all the training codes of Change3D!

  • [2025.02.27] Change3D has been accepted by CVPR 2025! 🎉🎉

📄 Abstract

We present Change3D, a unified video-based framework for change detection and captioning. Unlike traditional methods that use separate image encoders and multiple change extractors, Change3D treats bi-temporal images as a short video with learnable perception frames. A video encoder enables direct interaction and difference detection, simplifying the architecture. Our approach supports various tasks, including binary change detection (BCD), semantic change detection (SCD), building damage assessment (BDA), and change captioning (CC). Evaluated on eight benchmarks, Change3D outperforms SOTA methods while using only ~6%–13% of the parameters and ~8%–34% of the FLOPs.

🎮 Framework

Framework

Figure 1. Overall architectures of Change3D for Binary Change Detection, Semantic Change Detection, Building Damage Assessment, and Change Captioning.

📝 Performance

We conduct extensive experiments on eight public datasets: LEVIR-CD, WHU-CD, CLCD, HRSCD, SECOND, xBD, LEVIR-CC, and DUBAI-CC.

result_of_BCD

result_of_SCD

result_of_BDA

result_of_CC

🎯 How to Use

Installation

conda create -n Change3D python=3.11.0
conda activate Change3D
pip install -r requirements.txt

Pretrained Weight

Download the X3D-L weight and put it into the root directory.

Data Preparation

  • For BCD: Download LEVIR-CD, WHU-CD and CLCD datasets. Prepare the dataset into the following structure and crop each image into 256x256 patches.
    ├─Train
        ├─t1          jpg/png (input image of T1)
        ├─t2          jpg/png (input image of T2)
        └─label       jpg/png (binary change mask)
    ├─Val
        ├─t1 
        ├─t2
        └─label
    ├─Test
        ├─t1
        ├─t2
        └─label
  • For SCD: Download HRSCD and SECOND datasets. Prepare the dataset into the following structure and crop each image into 256x256 patches.
    ├─Train
        ├─t1          jpg/png  (input image of T1)
        ├─t2          jpg/png  (input image of T2)
        ├─label1      jpg/png  (semantic mask of T1)
        ├─label2      jpg/png  (semantic mask of T2)
        └─change      jpg/png  (binary change mask)
    ...

    ├─Test
        ├─t1
        ├─t2
        ├─label1
        ├─label2
        └─change
  • For BDA: Download xBD dataset. Prepare the dataset into the following structure and crop each image into 256x256 patches.
    ├─Train
        ├─t1          jpg/png  (input image of T1)
        ├─t2          jpg/png  (input image of T2)
        ├─label1      jpg/png  (damage localization mask)
        └─label2      jpg/png  (damage level mask)
    ...

    ├─Test
        ├─t1
        ├─t2
        ├─label1
        └─label2

🎮 Train & Evalaute the Models

Training binary change detection with LEVIR-CD dataset as an example:

python ./scripts/train_BCD.py --dataset LEVIR-CD
                              --file_root path/to/LEVIR-CD
                              --pretrained path/to/X3D_L.pyth
                              --save_dir ./exp
                              --gpu_id 0

Note: The above train script completes the evaluation automatically.

❤️ Acknowledgements

This repository is mainly built upon pytorchvideo and RSICCformer. Thanks for those well-organized codebases.

📧 Contact

If you have any issues while using the project, please feel free to contact me: [email protected].

📜 License

Change3D is released under the CC BY-NC-SA 4.0 license.

📚 Citation

If you find our work useful, please consider citing our paper:

@inproceedings{zhu2025change3d,
  title={Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective},
  author={Zhu, Duowang and Huang, Xiaohu and Huang, Haiyan and Zhou, Hao and Shao, Zhenfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

About

The official code of Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%