Repository of paper M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment.
M3-AGIQA integrates multimodal large language models (MLLMs) to evaluate AI-Generated Images across multiple dimensions, leveraging distilled quality captioning capabilities from online MLLMs to local models.
Performance comparison on quality aspect of AIGCIQA2023 dataset:
Disclaimer: The datasets uploaded are from external papers and are not owned by the repository owner. They are hosted on Hugging Face or Google Drive for easier access.
AGIQA-3k, AIGCIQA2023, AIGCIQA-20k, They can be also downloaded from GoogleDrive
For dataset AIGCIQA-20k, you may download the original dataset and then put descriptors in
aigciqa-20k.7z
with it.
Also they're avaiable on Huggingface: AGIQA-3k, AIGCIQA2023 (Only metadata), AIGCIQA-20k.
Fine-tuned adapters:
- AGIQA-3k: quality, correspondence
- AIGCIQA2023: quality, correspondence, authenticity,
- AIGCIQA-20k: quality
Checkpoints can be loaded as follows (strawhat/minicpm2.5-aigciqa-20k-ft
as an example):
from transformers import AutoModel
from peft import PeftModel
model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, torch_dtype=torch.float16).eval()
model = PeftModel.from_pretrained(model, "strawhat/minicpm2.5-aigciqa-20k-ft", trust_remote_code=True, torch_dtype=torch.float16).eval()
An xlstm environment is required to run the training script, notice that the xlstm library need to be run on the first GPU (cuda:0).
conda env create -f ./environment_xlstm.yml
conda activate xlstm
# training, need at least 20GB vram
python main.py --config ./cfg/minicpm-xlstm-agiqa-3k.yaml --run_name testrun
# predicting
# scores will be stored in ./predictions/<model_name>/<run_name>.json
python main.py --config ./cfg/minicpm-xlstm-agiqa-3k.yaml --stage predict --ckpt_path <best_checkpoint_path> --run_name testrun_predictions
- Prepare dataset for training with structure:
|--my_dataset
|--images # all the images
|--data.csv # images and their MOS score
|--train.json # json file with 2 conversations, answers can be empty since no fine-tuning needed. Check examples in ./data_processed
|--val.json
- Call
inference.py
with proper parametersFINETUNED_CKPT
(strawhat/minicpm2.5-aigciqa-20k-ft
is recommended for better cross dataset performance),eval_data_json
, andOUTPUT_FILE
to get the response by fine-tuned MLLM; - Create your own configuration file in
./cfg
(strawhat/minicpm2.5-aigciqa-20k-ft
would requiremodel_max_length
set to768
andmax_slice_nums
set to4
); - Launch the training and predicting command as previous section
Train & predict
describes.
- Prepare and call
inference.py
as step 1&2 in sectionw/ training
, you may only needtest.json
which represents your whole dataset due to no training process. - Run command for
predicting
to predict the scores.
Our experiments showed that with fine-tuned MLLM and additional training, the result is promising, while if you prefer to train your own MLLM from scratch, additional steps would be taken into account:
- Produce intermediate image quality descriptions in your preferred aspect (quality, correspondence, authenticity, or any other) manually or by online MLLM api. In our case, you may try the Gemini Flash API from Google, by running
./analyzers/gemini_image_analyzer.py
with a./analyzers/api.key
file including your Gemini API keys; - Prepare fine-tuning environment according to MiniCPM;
- Copy scripts in
./finetune
in this project to MiniCPM environment and apply your modifications and then run./finetune_lora.sh
to fine-tune the local MLLM;
40GB vram recommended.
- Do the rest training & predicting steps.
If you find our work useful, please cite it as follows:
@article{cui2025m3agiqa,
title={M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment},
author={Chuan Cui and Kejiang Chen and Zhihua Wei and Wen Shen and Weiming Zhang and Nenghai Yu},
journal={arXiv preprint arXiv:2502.13763},
year={2025}
}