Skip to content

jeon185/LaViC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LaViC: Large Vision-Language Conversational Recommendation Framework

DOI

This repository provides implementation of "Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation" (LaViC) which is published at KDD 2025. The process is as follows:

  1. Image Crawling (crawl_images.py): Crawls images before training LaViC.

  2. Visual Knowledge Self-Distillation (knowledge_distillation.py): Compresses each product image’s patch embeddings into a small set of [CLS]-positioned embeddings (one per sub-image), applying LoRA to the vision module.

  3. Recommendation Prompt Tuning (prompt_tuning.py): Fine-tunes LLaVA on a recommendation task by applying LoRA to the large language model, using 5 [CLS]-positioned tokens per item in candidate-based conversational recommendation.


Repository Structure

LaViC/
  ├── data/
  │   ├── all_beauty/
  │   │   ├── train.jsonl
  │   │   ├── valid.jsonl
  │   │   └── test.jsonl
  │   ├── amazon_fashion/
  │   │   ├── train.jsonl
  │   │   ├── valid.jsonl
  │   │   └── test.jsonl
  │   ├── amazon_home/
  │   │   ├── train.jsonl
  │   │   ├── valid.jsonl
  │   │   └── test.jsonl
  │   ├── train_images/
  │   ├── valid_images/
  │   ├── item2meta_train.json
  │   └── item2meta_valid.jsonl
  └── src/
      ├── crawl_images.py
      ├── knowledge_distillation.py
      └── prompt_tuning.py
  • data/:

    • Subdirectories for each domain (e.g., all_beauty, amazon_fashion, amazon_home), each containing train.jsonl, valid.jsonl, test.jsonl with conversational data and associated ground-truth items.
    • train_images/ and valid_images/: directories holding the actual product images (not included by default, but can be downloaded by the provided crawl_images.py).
    • item2meta_train.json (you can get this by unzip item2meta_train.json.zip) and item2meta_valid.jsonl:
      • item2meta_train.json is a dictionary mapping item IDs (ASINs) to metadata (e.g., title, categories, features, description, images, etc.). We also provide image descriptions generated by LLaVA-v1.6.
      • item2meta_valid.jsonl is a line-by-line JSON file that similarly describes items for validation, including a title, image_name, and a pre-generated image descriptions by LLaVA-v1.6.
  • src/:

    • crawl_images.py: Downloads product images from URLs in item2meta_train.json and item2meta_valid.jsonl into train_images/ and valid_images/, respectively.
    • knowledge_distillation.py: Distills image knowledge into [CLS] embeddings.
    • prompt_tuning.py: Fine-tunes the language model for conversation-based recommendation.

Quick Start

1. Environment Setup

Install required libraries:

cd LaViC
pip install -r requirements.txt

2. Image Crawling

Populate train_images/ and valid_images/:

cd src
python crawl_images.py

3. Visual Knowledge Self-Distillation

python knowledge_distillation.py \
  --model_name llava-hf/llava-v1.6-mistral-7b-hf \
  --train_data ../data/item2meta_train.json \
  --val_data ../data/item2meta_valid.jsonl \
  --train_images_dir ../data/train_images \
  --val_images_dir ../data/valid_images \
  --output_dir ./out_distilled \
  --lr 5e-5 --weight_decay 1e-5 --num_epochs 5 --batch_size 4
  • Key Arguments:
    • --model_name: The base LLaVA model to distill (e.g., llava-hf/llava-v1.6-mistral-7b-hf).
    • --train_data, --val_data: JSON or JSONL paths with product info and descriptions.
    • --train_images_dir, --val_images_dir: Image directories.
    • --output_dir: Where to save checkpoints and the final "vision_lora_adapter_best".
    • --lr, --weight_decay, --num_epochs, --batch_size: Basic training hyperparameters.

4. Recommendation Prompt Tuning

python prompt_tuning.py \
  --model_dir ./out_distilled/vision_lora_adapter_best \
  --candidate_type candidates_st \
  --finetune_output_dir ./out_finetuned \
  --max_length 2048 \
  --batch_size 1 \
  --lr 5e-5 --weight_decay 1e-5 \
  --num_epochs 1 \
  --item_meta_path ../data/item2meta_train.json \
  --image_dir ../data/train_images \
  --category all_beauty
  • Key Arguments:
    • --model_dir: Path to the distilled model from the previous distillation step.
    • --candidate_type: Which key in your conversation JSON indicates the candidate items (e.g., candidates_st of candidates_gpt_large).
    • --category: The domain (subdirectory) for your data (e.g., all_beauty, amazon_fashion, or amazon_home).
    • --item_meta_path: JSON with item metadata (titles, etc.).
    • --image_dir: Directory containing product images.
    • --finetune_output_dir: Where to save the final LoRA adapter for the LM side.

Citation

To cite LaViC in your work, please use the following BibTeX entry:

@inproceedings{jeon25adapting,
  title = "Adapting large vision-language models to visually-aware conversational recommendation",
  author = "Hyunsik Jeon and Satoshi Koide and Yu Wang and Zhankui He and Julian McAuley",
  year = "2025",
  booktitle = "KDD"
}

About

Implementation of LaViC (KDD 2025)

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages