PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings

To address the limitation of existing datasets that do not sufficiently provide information about the relationship between persuasion effectiveness and the persuadee’s psychological characteristics (e.g., personality and values), we construct and release the Personalized Visual Persuasion (PVP) dataset.

The PVP dataset consists of 28,454 persuasive images related to 596 messages across 20 topics inspired by U.S. government departments and agencies. These messages are designed to influence viewer behaviors (e.g., “Do not smoke”). A distinctive feature of our dataset is the incorporation of nine persuasion strategies based on theoretical frameworks (e.g., gain frame: depicting a positive consequence of the target behavior). To enable images to reflect these strategies, we employed a novel method of generating images using DALLE and additionally sourced images through Google Image Search.

Recognizing that the persuadee’s psychological characteristics play a crucial role in persuasion effectiveness, we collected persuasiveness scores for the images, along with annotators’ demographics, habits, Big 5 personality traits, and values.

In addition, we are releasing a persuasive image generator and an automated evaluator, which establish benchmark baselines and pave the way for future research in AI-driven persuasion.

📢 Updates

🆕 2025-05-28: Official Version Released!

This is now the official repository of the project!
For additional information, experiments, or earlier versions, please refer to the original personal repository here.

2025-01-24: Major Update!

We are excited to announce significant advancements in our personalized visual persuasion research.

🚀 Key Highlights of This Update

📂 PVP Dataset Release
- Includes 28,454 persuasive images covering 596 messages and 9 persuasion strategies.
- Evaluated by 2,521 human annotators with detailed demographic and psychological profiles, including personality traits and values.
🛠️ PVP Dataset Construction Framework
- A comprehensive framework for data collection, processing, and organization.
- Designed to facilitate future research in personalized visual persuasion.
⚙️ Generator & Evaluator Tools
- Persuasive Image Generator: Generates personalized persuasive content based on user profiles.
- Automated Evaluator: Assesses the persuasiveness of images using established benchmarks, paving the way for AI-driven persuasion advancements.

Stay tuned for the official release of the PVP dataset!
Meanwhile, check out our latest paper to explore our findings and contributions.

Datasets

The upper figure shows two examples from the dataset.

A distinctive feature of our dataset is the utilization of nine persuasion strategies based on theoretical frameworks (e.g., gain frame: depicting a positive consequence of the target behavior). To ensure that these strategies are effectively reflected in the images, we employed a novel image generation method using DALLE and additionally sourced images via Google Image Search.

Furthermore, recognizing that the characteristics of viewers play a crucial role in persuasion effectiveness, we collected persuasiveness scores for the images, along with annotators’ demographic information, habits, personality traits (Big 5 personality traits, Goldberg, 2013), and values (Schwartz, 2012; Graham et al., 2013).

CSV Column Descriptions

The csv_files directory contains train_paper, eval_paper, and test_paper, which are the files used in this study to train, evaluate, and test the model.

Annotator_ID: Globally unique identifier for the annotator
Gender: Annotator’s self-reported gender (1 = Male, 2 = Female)
Age: Annotator’s age (years)
PVQ21, Big5, MFQ: Psychological traits (Schwartz Theory of Basic Values, Big 5 personality traits, and Moral Foundations) of the annotator
Habit: Whether the annotator typically performs the behavior expressed by the message (Yes = usually does; No = does not). For example, for the message 'Do not litter while walking', a 'Yes' habit indicates that the annotator usually does not litter while walking.
Topic: Topic category the image belongs to
Message_ID: Unique identifier for the message.
Message: The persuasive message conveyed by the image
Annotator_ID_by_Message: Per-message annotator index. For example, if a message lists annotator IDs 0–3, four different annotators labeled that message.
Image_Num: Index of the image within the message
Strategy: Persuasive strategy used
Method: Image generation method (dalle, google)
Pos_Neg: Framing of the message (pos = positive framing; neg = negative framing)
Premise: Premise used to instantiate the persuasive strategy
Query: Text prompt used to generate the image for dalle method and keywords to search the image for google method
Description: Text description of the image
Score: Persuasiveness score assigned by the annotator (0–10)

Image Folder Structure

Each image folder is named exactly as the CSV value in Message. Inside that folder, images are saved using Image_Num as the filename.

Path convention: image_files/<Message>/<Image_Num>.jpg

image_files/
├─ Do_not_listen_to_music_while_studying/
│  ├─ 1.jpg
│  └─ 2.jpg
└─ Get_a_health_check_up_regularly/
   ├─ 1.jpg
   └─ 2.jpg

Use the pair (Message, Image_Num) from the CSV to locate the corresponding image file.

💾 Installation

Prepare the Environment File

We are currently using OpenAI’s API for data construction.

Please create a .env file in the root directory of the repository to securely store the required API keys.

Sample .env file containing private information:

OPENAI_API_KEY='your-api-key-here'"

OPENAI_API_KEY is the OpenAI API key. You can get it from the OpenAI.

Setup

Python Environment

Please setup your python environment and install the required dependencies as:

# Clone the repository
git clone https://github.com/holi-lab/PVP_Personalized_Visual_Persuasion.git
cd PVP_Personalized_Visual_Persuasion

# Create a new conda environment
conda create --name PVP-DEV python=3.12
conda activate PVP-DEV
pip install -r requirements.txt

📊 Data Construction

1. DALL·E Generation

We generated a single image for each query using GPT-4o.
For more details, please refer to the README in ./data_construction/dalle.

2. Google Image Search

We used Serper API to collect 40 images per query from Google Image Search.
To filter out images containing excessive text, we applied easyOCR, which analyzes the text within images.
Images with more than 20 characters detected with an OCR confidence score of 95% or higher were excluded.
- The filtered images can be found in ./data_construction/ocr_image.
After the filtering process, the remaining images underwent validation, and the top 3 images that passed were included in the dataset. (Refer to the Validation section below.)

3. Validation

We used GPT-4o to validate whether the collected images effectively convey the intended message.

Initially, we observed that GPT-4o tended to assign overly high scores, even when images did not fully align with the intended premise.
To improve the evaluation process, we implemented a two-step validation approach:

Image Interpretation:
- GPT analyzes the image to understand its meaning.
Premise Alignment:
- The interpretation is evaluated against the intended message using tailored prompts for different persuasion strategies.

Additionally, we applied an automated content filtering protocol to remove offensive material. This was followed by a manual review to eliminate any violent, offensive, or inappropriate content.

The validation process details can be found in ./data_construction/gpt_eval.

🚀 Evaluator & Generator

You can access the Evaluator and Generator in ./model.

Generator: Responsible for generating personalized persuasive images. In our model, instead of directly creating images, it generates descriptions that can be used by an image generation model.
Evaluator: Assesses the persuasiveness of the generated images tailored to target viewers.

For more detailed information, please refer to the README file inside the respective folder.

🌲 File Structure

This section provides an overview of the core directories and organizational structure of the our paper's source code. For detailed descriptions of each directory, please refer to the README file within the respective directory.

PVP_Personalized_Visual_Persuasion
├── README.md
├── data_analysis
├── data_contruction
│   ├── bootstrap
│   ├── dalle
│   │   └── dalle_generate_image.py
│   ├── gpt_eval
│   │   └── gpteval_image.py
│   └── ocr_image
│       └── ocr_example.py
├── eval_annotation
│   ├── filtering.py
│   ├── reduce_duplicated_image.py
│   ├── score.py
│   └── survey_result.py
└── model
    ├── README.md
    ├── data_preprocessing
    │   ├── data
    │   ├── prompt_csv_to_json.py
    │   └── prompt_generator.py
    ├── evaluator
    │   ├── config
    │   │   ├── big5
    │   │   ├── mfq30
    │   │   ├── none
    │   │   └── pvq21
    │   │       ├── prompt1_pvq21_indomain_1epoch.yaml
    │   │       ├── prompt1_pvq21_indomain_1epoch_dalle.yaml
    │   │       ├── prompt1_pvq21_indomain_1epoch_filter.yaml
    │   │       └── prompt1_pvq21_indomain_1epoch_google.yaml
    │   ├── data
    │   │   ├── big5
    │   │   │   ├── eval_data_prompt1_big5.json
    │   │   │   ├── test_data_prompt1_big5.json
    │   │   │   └── train_data_prompt1_big5.json
    │   │   ├── mfq30
    │   │   │   ├── eval_data_prompt1_mfq30.json
    │   │   │   ├── test_data_prompt1_mfq30.json
    │   │   │   └── train_data_prompt1_mfq30.json
    │   │   ├── none
    │   │   │   ├── eval_data_prompt1_none.json
    │   │   │   ├── test_data_prompt1_none.json
    │   │   │   └── train_data_prompt1_none.json
    │   │   └── pvq21
    │   │       ├── dalle
    │   │       ├── filter
    │   │       ├── final
    │   │       │   ├── eval_data_prompt1_pvq21.json
    │   │       │   ├── test_data_prompt1_pvq21.json
    │   │       │   └── train_data_prompt1_pvq21.json
    │   │       └── google
    │   ├── metric
    │   │   ├── dataloader.py
    │   │   ├── metric_evaluator.py
    │   │   └── metric_generator.py
    │   └── model
    │       ├── dataloader.py
    │       ├── inference.py
    │       ├── run_training_pvq21.sh
    │       └── train.py
    └── generator
        ├── config
        │   ├── generator_prompt1_pvq21_indomain_over8_1epoch.yaml
        │   └── ...
        ├── data
        │   ├── generator_test.json
        │   ├── gpt
        │   │   ├── gpt_4o_mini_test.json
        │   │   └── gpt_4o_test.json
        │   ├── over7
        │   ├── over8
        │   │   ├── generator_eval_paper_over8.json
        │   │   ├── generator_test_paper_over8.json
        │   │   └── generator_train_paper_over8.json
        │   └── over9
        └── model
            ├── dataloader.py
            ├── formatter.py
            ├── generator_gpt_zero_shot.py
            ├── inference.py
            ├── run_training_over7.sh
            └── train.py

📚 Abstract

Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artificial intelligence, there is growing potential to develop persuasive systems that automatically generate persuasive images tailored to individuals. However, a significant bottleneck in this area is the lack of comprehensive datasets that connect the persuasiveness of images with the personal information about those who evaluated the images. To address this gap and facilitate technological advancements in personalized visual persuasion, we release the Personalized Visual Persuasion (PVP) dataset, comprising 28,454 persuasive images across 596 messages and 9 persuasion strategies. Importantly, the PVP dataset provides persuasiveness scores of images evaluated by 2,521 human annotators, along with their demographic and psychological characteristics (personality traits and values). We demonstrate the utility of our dataset by developing a persuasive image generator and an automated evaluator, and establish benchmark baselines. Our experiments reveal that incorporating psychological characteristics enhances the generation and evaluation of persuasive images, providing valuable insights for personalized visual persuasion.

😀 Authors and Citation

Junseo Kim, Jongwook Han, Dongmin Choi, Jongwook Yoon, Eun-Ju Lee, Yohan Jo*

Please cite our paper if you use the code or data in this repository.

@inproceedings{kim-etal-2025-pvp,
   title = "{PVP}: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings",
   author = "Kim, Junseo and Han, Jongwook and Choi, Dongmin and Yoon, Jongwook and Lee, Eun-Ju and Jo, Yohan",
   booktitle={ACL},
   year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings

📢 Updates

🆕 2025-05-28: Official Version Released!

2025-01-24: Major Update!

Datasets

CSV Column Descriptions

Image Folder Structure

💾 Installation

Prepare the Environment File

Setup

Python Environment

📊 Data Construction

1. DALL·E Generation

2. Google Image Search

3. Validation

🚀 Evaluator & Generator

🌲 File Structure

📚 Abstract

😀 Authors and Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
data_analysis		data_analysis
data_contruction		data_contruction
eval_annotation		eval_annotation
model		model
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings

📢 Updates

🆕 2025-05-28: Official Version Released!

2025-01-24: Major Update!

Datasets

CSV Column Descriptions

Image Folder Structure

💾 Installation

Prepare the Environment File

Setup

Python Environment

📊 Data Construction

1. DALL·E Generation

2. Google Image Search

3. Validation

🚀 Evaluator & Generator

🌲 File Structure

📚 Abstract

😀 Authors and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages