MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing
Official Pytorch implementation of the paper "MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing".
Remote sensing applications need high-resolution imagery, but hardware and acquisition constraints often limit image quality. While Vision Transformers (ViTs) have advanced RSISR (Remote Sensing Image Super-Resolution), they struggle with high computational costs and limited contextual understanding. MAG-ViT addresses these challenges by combining local and global self-attention efficiently with linear complexity. At the heart of MAG-ViT is the HaloMBConv module, which integrates halo-based attention and mobile bottleneck convolutions to enhance spatial details while reducing redundant computations. The model uses a dual-attention strategy: fixed windows for local features and grid windows for capturing broader context, strengthened by residual connections. Experiments on UCMerced and AID datasets show that MAG-ViT achieves up to 1.1 dB PSNR and 0.03 SSIM improvements over state-of-the-art methods, while offering faster inference than diffusion-based models making it highly suitable for practical remote sensing tasks.
- Python 3.6+
- Pytorch>=1.6
- torchvision>=0.7.0
- einops
- matplotlib
- cv2
- scipy
- tqdm
- scikit
Clone or download this code and install aforementioned requirements
cd codes
Download the UCMerced and AID datasets from the following links:
-
UCMerced Dataset:
Baidu Drive (Password:terr)
Google Drive -
AID Dataset:
Baidu Drive (Password:id1n)
Google Drive
The datasets are already split into train, validation, and test sets.
The original images serve as the high-resolution (HR) references, and the corresponding low-resolution (LR) images are generated by bicubic downsampling.
Important:
When preparing the datasets, make sure the folder structure matches the expected format used in the code.
The datasets should be organized as follows:
For AID:
- /data/Image_restoration/Datasets/AID-dataset/
- train/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- val/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- test/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- train/
For UCMerced:
- /data/Image_restoration/Datasets/UCMerced-dataset/
- train/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- val/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- test/
- HR/
- LR_x2/
- LR_x3/
- LR_x4/
- train/
# x4
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=4 --patch_size=192 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --n_GPUs 1 --save=MAGVITx4_UCMerced
# x3
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=3 --patch_size=144 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --save=MAGVITx3_UCMerced
# x2
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=2 --patch_size=96 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --save=MAGVITx2_UCMerced
The train/val data pathes are set in data/init.py
Pre-trained TransENet models for the UCMerced and AID datasets are available here:
Baidu Drive (Password: w7ct) | Google Drive
Before running the test, you need to manually set the input and output paths inside the demo_deploy.py file:
args.dir_data = '/path/to/your/LR_x1' # Path to the low-resolution input images
args.dir_out = '/path/to/save/output' # Path where the output results will be saved# x4
python demo_deploy.py --model=MAGVIT --scale=4
# x3
python demo_deploy.py --model=MAGVIT --scale=3
# x2
python demo_deploy.py --model=MAGVIT --scale=2
The output images generated by the trained models on the UCMerced and AID datasets can be downloaded here:
-
UCMerced Results:
Google Drive - UCMerced_Results -
AID Results:
Google Drive - AID_Results
These folders contain the visual results obtained after running the testing phase using the pre-trained models.
To reproduce the evaluation results (PSNR, SSIM, and LPIPS metrics) on the UCMerced and AID datasets:
-
Download the predicted output images from the results links:
-
Open and run the notebook
evaluation.ipynb. -
In the notebook, set the paths to:
- Ground-truth (HR) images
- Predicted output images
-
The notebook will automatically calculate and print the average PSNR, SSIM, and LPIPS scores.
Note: Make sure you install the required libraries before running the evaluation:
pip install basicsr lpips
The evaluation code uses metrics from BasicSR for accurate computation.
If you find this code useful for your research, please cite our paper:
@article{ali2024magvit,
title = {MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing},
journal = {Under Review / Preprint},
year = {2026},
}
This code is built on TransENet (Pytorch) and BasicSR. We thank the authors for sharing the codes.