M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

Repository of paper M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment.

M3-AGIQA integrates multimodal large language models (MLLMs) to evaluate AI-Generated Images across multiple dimensions, leveraging distilled quality captioning capabilities from online MLLMs to local models.

Performance comparison on quality aspect of AIGCIQA2023 dataset:

Datasets & Checkpoints

Disclaimer: The datasets uploaded are from external papers and are not owned by the repository owner. They are hosted on Hugging Face or Google Drive for easier access.

AGIQA-3k, AIGCIQA2023, AIGCIQA-20k, They can be also downloaded from GoogleDrive

For dataset AIGCIQA-20k, you may download the original dataset and then put descriptors in aigciqa-20k.7z with it.

Also they're avaiable on Huggingface: AGIQA-3k, AIGCIQA2023 (Only metadata), AIGCIQA-20k.

Fine-tuned adapters:

AGIQA-3k: quality, correspondence
AIGCIQA2023: quality, correspondence, authenticity,
AIGCIQA-20k: quality

Checkpoints can be loaded as follows (strawhat/minicpm2.5-aigciqa-20k-ft as an example):

from transformers import AutoModel
from peft import PeftModel

model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, torch_dtype=torch.float16).eval()
model = PeftModel.from_pretrained(model, "strawhat/minicpm2.5-aigciqa-20k-ft", trust_remote_code=True, torch_dtype=torch.float16).eval()

Train & predict

An xlstm environment is required to run the training script, notice that the xlstm library need to be run on the first GPU (cuda:0).

conda env create -f ./environment_xlstm.yml
conda activate xlstm

# training, need at least 20GB vram
python main.py --config ./cfg/minicpm-xlstm-agiqa-3k.yaml --run_name testrun

# predicting
# scores will be stored in ./predictions/<model_name>/<run_name>.json
python main.py --config ./cfg/minicpm-xlstm-agiqa-3k.yaml --stage predict --ckpt_path <best_checkpoint_path> --run_name testrun_predictions

Dataset Adaption

With Training

Prepare dataset for training with structure:

|--my_dataset
    |--images   # all the images
    |--data.csv # images and their MOS score
    |--train.json   # json file with 2 conversations, answers can be empty since no fine-tuning needed. Check examples in ./data_processed 
    |--val.json

Call inference.py with proper parameters FINETUNED_CKPT (strawhat/minicpm2.5-aigciqa-20k-ft is recommended for better cross dataset performance), eval_data_json, and OUTPUT_FILE to get the response by fine-tuned MLLM;
Create your own configuration file in ./cfg (strawhat/minicpm2.5-aigciqa-20k-ft would require model_max_length set to 768 and max_slice_nums set to 4);
Launch the training and predicting command as previous section Train & predict describes.

Without Training

Prepare and call inference.py as step 1&2 in section w/ training, you may only need test.json which represents your whole dataset due to no training process.
Run command for predicting to predict the scores.

Fine-tune & Train from scratch

Our experiments showed that with fine-tuned MLLM and additional training, the result is promising, while if you prefer to train your own MLLM from scratch, additional steps would be taken into account:

Produce intermediate image quality descriptions in your preferred aspect (quality, correspondence, authenticity, or any other) manually or by online MLLM api. In our case, you may try the Gemini Flash API from Google, by running ./analyzers/gemini_image_analyzer.py with a ./analyzers/api.key file including your Gemini API keys;
Prepare fine-tuning environment according to MiniCPM;
Copy scripts in ./finetune in this project to MiniCPM environment and apply your modifications and then run ./finetune_lora.sh to fine-tune the local MLLM;

40GB vram recommended.

Do the rest training & predicting steps.

Citation

If you find our work useful, please cite it as follows:

@article{cui2025m3agiqa,
    title={M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment}, 
    author={Chuan Cui and Kejiang Chen and Zhihua Wei and Wen Shen and Weiming Zhang and Nenghai Yu},
    journal={arXiv preprint arXiv:2502.13763},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
analyzers		analyzers
cfg		cfg
data_processed		data_processed
finetune		finetune
finetune_datapre		finetune_datapre
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
data_explorer.ipynb		data_explorer.ipynb
dataset.py		dataset.py
environment_xlstm.yml		environment_xlstm.yml
inference.py		inference.py
loss.py		loss.py
main.py		main.py
radar_plot.png		radar_plot.png
utils.py		utils.py
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

Datasets & Checkpoints

Train & predict

Dataset Adaption

With Training

Without Training

Fine-tune & Train from scratch

Citation

About

Releases

Packages

Languages

License

strawhatboy/M3-AGIQA

Folders and files

Latest commit

History

Repository files navigation

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

Datasets & Checkpoints

Train & predict

Dataset Adaption

With Training

Without Training

Fine-tune & Train from scratch

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages