Skip to content

pangaass/M-IND

Repository files navigation

MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model

Model Diagram

This project leverages a multi-modal structural-enhanced language model for effective incorrect assignment detection. We supply detailed instructions below for setting up the environment, downloading the necessary models and datasets, running training scripts, and evaluating the models.

📃 Paper 🤖 Model Parameters 💻 GitHub

🚀 Quick Start

Dependencies

Build a conda environment using:

conda create -n mind python==3.10
conda activate mind

# Navigate to the current directory
cd ./M-IND

# Install required packages
pip install -r requirements.txt

Datasets

mkdir data
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-WhoIsWho.zip
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-test-public.zip
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-WhoIsWho-valid.zip

unzip data/IND-WhoIsWho.zip -d data
unzip data/IND-test-public.zip -d data
unzip data/IND-WhoIsWho-valid.zip -d data

Required Models and Parameters

download Meta-Llama-3-8B and roberta model from huggingface or modelscope

download and trained checkpoints to params/ from modelscope

download node embeddings of GCCAD from here

Reproduce from checkpoint

# Firstly edit model_name_or_path and ptm_model_path from configs/reproduce.json 
bash script/reproduce.sh 

python data/IND-WhoIsWho-valid/eval_valid_ind.py  -hp output/predict/predict_res.json -rf data/IND-WhoIsWho-valid/ind_valid_author_ground_truth.json -l result.log

Train & Evaluate

Config
change the path of all the config file( in ./configs/llama3/* ):

  • ptm_model_path : path to roberta model
  • model_name_or_path : path to Meta-Llama-3-8B
  • stage1.sh : add your wandb API key
bash script/stage1.sh

# get best eval epoch and edit "lora_ckpt_path" configs/llama3/stage2.json

bash script/stage2.sh

# get best eval step and edit "lora_ckpt _path" and "text_proj_ckpt_path" configs/llama3/stage3.json

bash script/stage3.sh

# get best eval step and edit "lora_ckpt_path" , "text_proj_ckpt_path" and "graph_proj_ckpt_path" configs/llama3/eval.json

#eval 
bash script/eval.sh

Citation

@artical{pang2024mind,
      title={MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model}, 
      author={},
      journal={arXiv preprint arXiv:},
      year={2024},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published