MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing

Official Pytorch implementation of the paper "MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing".

Remote sensing applications need high-resolution imagery, but hardware and acquisition constraints often limit image quality. While Vision Transformers (ViTs) have advanced RSISR (Remote Sensing Image Super-Resolution), they struggle with high computational costs and limited contextual understanding. MAG-ViT addresses these challenges by combining local and global self-attention efficiently with linear complexity. At the heart of MAG-ViT is the HaloMBConv module, which integrates halo-based attention and mobile bottleneck convolutions to enhance spatial details while reducing redundant computations. The model uses a dual-attention strategy: fixed windows for local features and grid windows for capturing broader context, strengthened by residual connections. Experiments on UCMerced and AID datasets show that MAG-ViT achieves up to 1.1 dB PSNR and 0.03 SSIM improvements over state-of-the-art methods, while offering faster inference than diffusion-based models making it highly suitable for practical remote sensing tasks.

Requirements

Python 3.6+
Pytorch>=1.6
torchvision>=0.7.0
einops
matplotlib
cv2
scipy
tqdm
scikit

Installation

Clone or download this code and install aforementioned requirements

cd codes

Dataset Preparation

Download the UCMerced and AID datasets from the following links:

UCMerced Dataset:
Baidu Drive (Password: terr)
Google Drive
AID Dataset:
Baidu Drive (Password: id1n)
Google Drive

The datasets are already split into train, validation, and test sets.
The original images serve as the high-resolution (HR) references, and the corresponding low-resolution (LR) images are generated by bicubic downsampling.

Important:
When preparing the datasets, make sure the folder structure matches the expected format used in the code.
The datasets should be organized as follows:

For AID:

/data/Image_restoration/Datasets/AID-dataset/
- train/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/
- val/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/
- test/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/

For UCMerced:

/data/Image_restoration/Datasets/UCMerced-dataset/
- train/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/
- val/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/
- test/
  - HR/
  - LR_x2/
  - LR_x3/
  - LR_x4/

Training

# x4
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=4 --patch_size=192 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --n_GPUs 1 --save=MAGVITx4_UCMerced
# x3
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=3 --patch_size=144 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --save=MAGVITx3_UCMerced
# x2
python demo_train.py --model=MAGVIT --dataset=UCMerced --scale=2 --patch_size=96 --loss 1*L1 --lr 1e-4 --ext=img --epochs 2500 --batch_size 8 --save=MAGVITx2_UCMerced

The train/val data pathes are set in data/init.py

Testing

Pre-trained TransENet models for the UCMerced and AID datasets are available here:
Baidu Drive (Password: w7ct) | Google Drive

Before running the test, you need to manually set the input and output paths inside the demo_deploy.py file:

args.dir_data = '/path/to/your/LR_x1'  # Path to the low-resolution input images
args.dir_out = '/path/to/save/output'  # Path where the output results will be saved

# x4
python demo_deploy.py --model=MAGVIT --scale=4
# x3
python demo_deploy.py --model=MAGVIT --scale=3
# x2
python demo_deploy.py --model=MAGVIT --scale=2

Results

The output images generated by the trained models on the UCMerced and AID datasets can be downloaded here:

UCMerced Results:
Google Drive - UCMerced_Results
AID Results:
Google Drive - AID_Results

These folders contain the visual results obtained after running the testing phase using the pre-trained models.

Evaluation

To reproduce the evaluation results (PSNR, SSIM, and LPIPS metrics) on the UCMerced and AID datasets:

Download the predicted output images from the results links:
- UCMerced Results
- AID Results
Open and run the notebook evaluation.ipynb.
In the notebook, set the paths to:
- Ground-truth (HR) images
- Predicted output images
The notebook will automatically calculate and print the average PSNR, SSIM, and LPIPS scores.

Note: Make sure you install the required libraries before running the evaluation:
pip install basicsr lpips

The evaluation code uses metrics from BasicSR for accurate computation.

Citation

If you find this code useful for your research, please cite our paper:

@article{ali2024magvit,
  title     = {MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing},
  journal   = {Under Review / Preprint},
  year      = {2026},
}

Acknowledgements

This code is built on TransENet (Pytorch) and BasicSR. We thank the authors for sharing the codes.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
codes		codes
experiment		experiment
README.md		README.md
evaluation.ipynb		evaluation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing

Requirements

Installation

Dataset Preparation

Training

Testing

Results

Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MAG-ViT: Multi-Attention Grid Vision Transformer for High-Fidelity Super-Resolution in Remote Sensing

Requirements

Installation

Dataset Preparation

Training

Testing

Results

Evaluation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages