- 2025-01-22: This repo is released (Private).
Abstract: In recent years, Transformer has witnessed significant progress in food recognition. However, most existing approaches still face two critical challenges in lightweight food recognition: (1) the quadratic complexity and redundant feature representation from interactions with irrelevant tokens; (2) static feature recognition and single-scale representation, which overlook the unstructured, non-fixed nature of food images and the need for multi-scale features. To address these, we propose an adaptive and efficient sparse Transformer architecture (Fraesormer) with two core designs: Adaptive Top-k Sparse Partial Attention (ATK-SPA) and Hierarchical Scale-Sensitive Feature Gating Network (HSSFGN). ATK-SPA uses a learnable Gated Dynamic Top-K Operator (GDTKO) to retain critical attention scores, filtering low query-key matches that hinder feature aggregation. It also introduces a partial channel mechanism to reduce redundancy and promote expert information flow, enabling local-global collaborative modeling. HSSFGN employs gating mechanism to achieve multi-scale feature representation, enhancing contextual semantic information. Extensive experiments show that Fraesormer outperforms state-of-the-art methods.
- Python 3.8
- PyTorch 1.11.0+cu113
conda create -n fraesormer python=3.8
conda activate fraesormer
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
Used training and testing sets can be downloaded as follows:
Datasets | Content | Link |
---|---|---|
ETHZ Food-101 | ETHZ Food-101 contains 101 categories with a total of 101,000 images | baidu |
Vireo Food-172 | Vireo Food-172 consists of 172 categories with 67,288 images | baidu |
UEC Food-256 | UEC Food-256 includes 256 categories with 31,395 images | baidu |
SuShi-50 | SuShi-50 contains 50 categories with 3,963 images | baidu |
Download training and testing datasets and put them into the corresponding folders of datasets/
.
Download the all food dataset and structure the data as follows:
/path/to/datasets/
ETHZ Food-101/
train/
class1/
img1.jpeg
class2/
img2.jpeg
validation/
class1/
img3.jpeg
class2/
img4.jpeg
Vireo Food-172/
train/
class1/
img1.jpeg
class2/
img2.jpeg
validation/
class1/
img3.jpeg
class2/
img4.jpeg
UEC Food-256/
SuShi-50/
To train Fraesormer models, follow the respective command below:
Fraesormer-Tiny
python main.py --model Fraesormer-Tiny --data-set ETHZ_Food-101 --data-path $PATH_TO_ETHZ_Food-101 --output_dir $PATH_Result_ETHZ_Food-101
Fraesormer-base
python main.py --model Fraesormer-Base --data-set ETHZ_Food-101 --data-path $PATH_TO_ETHZ_Food-101 --output_dir $PATH_Result_ETHZ_Food-101
Fraesormer-Large
python main.py --model Fraesormer-Large --data-set ETHZ_Food-101 --data-path $PATH_TO_ETHZ_Food-101 --output_dir $PATH_Result_ETHZ_Food-101
Fraesormer-Tiny
python main.py --model Fraesormer-Tiny --data-set Vireo_Food-172 --data-path $PATH_TO_Vireo_Food-172 --output_dir $PATH_Result_Vireo_Food-172
Fraesormer-base
python main.py --model Fraesormer-Base --data-set Vireo_Food-172 --data-path $PATH_TO_Vireo_Food-172 --output_dir $PATH_Result_Vireo_Food-172
Fraesormer-Large
python main.py --model Fraesormer-Large --data-set Vireo_Food-172 --data-path $PATH_TO_Vireo_Food-172 --output_dir $PATH_Result_Vireo_Food-172
Fraesormer-Tiny
python main.py --model Fraesormer-Tiny --data-set UEC_Food-256 --data-path $PATH_TO_UEC_Food-256 --output_dir $PATH_Result_UEC_Food-256
Fraesormer-base
python main.py --model Fraesormer-Base --data-set UEC_Food-256 --data-path $PATH_TO_UEC_Food-256 --output_dir $PATH_Result_UEC_Food-256
Fraesormer-Large
python main.py --model Fraesormer-Large --data-set UEC_Food-256 --data-path $PATH_TO_UEC_Food-256 --output_dir $PATH_Result_UEC_Food-256
Fraesormer-Tiny
python main.py --model Fraesormer-Tiny --data-set SuShi-50 --data-path $PATH_TO_SuShi-50 --output_dir $PATH_Result_SuShi-50
Fraesormer-base
python main.py --model Fraesormer-Base --data-set SuShi-50 --data-path $PATH_TO_SuShi-50 --output_dir $PATH_Result_SuShi-50
Fraesormer-Large
python main.py --model Fraesormer-Large --data-set SuShi-50 --data-path $PATH_TO_SuShi-50 --output_dir $PATH_Result_SuShi-50
Run the following command to evaluate a pre-trained Fraesormer-Tiny on UEC_Food-256 validation set with a single GPU:
python main.py --eval --model Fraesormer-Tiny --resume ./Fraesormer-Tiny.pth --data-path $PATH_TO_UEC_Food_256
We achieve state-of-the-art performance. Detailed results can be found in the paper.
Quantitative Comparisons (click to expand)
- Results in Table 1 (main paper)
- Results in Figure 1 (main paper)
If you find the code helpful in your research or work, please cite the following paper(s).
@article{zou2025fraesormer,
title={Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition},
author={Zou, Shun and Zou, Yi and Zhang, Mingya and Luo, Shipeng and Chen, Zhihao and Gao, Guangwei},
journal={ICME},
year={2025}
}
We sincerely appreciate SHViT, Swin Transformer, LeViT, pytorch-image-models, EfficientViT and PyTorch for their wonderful implementations.