MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model

This project leverages a multi-modal structural-enhanced language model for effective incorrect assignment detection. We supply detailed instructions below for setting up the environment, downloading the necessary models and datasets, running training scripts, and evaluating the models.

📃 Paper 🤖 Model Parameters 💻 GitHub

🚀 Quick Start

Dependencies

Build a conda environment using:

conda create -n mind python==3.10
conda activate mind

# Navigate to the current directory
cd ./M-IND

# Install required packages
pip install -r requirements.txt

Datasets

mkdir data
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-WhoIsWho.zip
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-test-public.zip
wget --directory-prefix data https://open-data-set.oss-cn-beijing.aliyuncs.com/oag-benchmark/kddcup-2024/IND-WhoIsWho/IND-WhoIsWho-valid.zip

unzip data/IND-WhoIsWho.zip -d data
unzip data/IND-test-public.zip -d data
unzip data/IND-WhoIsWho-valid.zip -d data

Required Models and Parameters

download Meta-Llama-3-8B and roberta model from huggingface or modelscope

download and trained checkpoints to params/ from modelscope

download node embeddings of GCCAD from here

Reproduce from checkpoint

# Firstly edit model_name_or_path and ptm_model_path from configs/reproduce.json 
bash script/reproduce.sh 

python data/IND-WhoIsWho-valid/eval_valid_ind.py  -hp output/predict/predict_res.json -rf data/IND-WhoIsWho-valid/ind_valid_author_ground_truth.json -l result.log

Train & Evaluate

Config
change the path of all the config file( in ./configs/llama3/* ):

ptm_model_path : path to roberta model
model_name_or_path : path to Meta-Llama-3-8B
stage1.sh : add your wandb API key

bash script/stage1.sh

# get best eval epoch and edit "lora_ckpt_path" configs/llama3/stage2.json

bash script/stage2.sh

# get best eval step and edit "lora_ckpt _path" and "text_proj_ckpt_path" configs/llama3/stage3.json

bash script/stage3.sh

# get best eval step and edit "lora_ckpt_path" , "text_proj_ckpt_path" and "graph_proj_ckpt_path" configs/llama3/eval.json

#eval 
bash script/eval.sh

Citation

@artical{pang2024mind,
      title={MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model}, 
      author={},
      journal={arXiv preprint arXiv:},
      year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
bin		bin
configs		configs
script		script
tool		tool
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
model.py		model.py
pipeline_for_multimodal.py		pipeline_for_multimodal.py
predict_for_multimodal.py		predict_for_multimodal.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model

🚀 Quick Start

Dependencies

Datasets

Required Models and Parameters

Reproduce from checkpoint

Train & Evaluate

Citation

About

Uh oh!

Releases 1

Packages

Languages

pangaass/M-IND

Folders and files

Latest commit

History

Repository files navigation

MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structural-Enhanced Language Model

🚀 Quick Start

Dependencies

Datasets

Required Models and Parameters

Reproduce from checkpoint

Train & Evaluate

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages