Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on benchmarks after training. #24

Open
Jinstorm opened this issue Feb 23, 2025 · 7 comments
Open

Evaluation on benchmarks after training. #24

Jinstorm opened this issue Feb 23, 2025 · 7 comments

Comments

@Jinstorm
Copy link

Does there exist or could you provide a bash script or a python function for evaluating trained model on commonsense datasets (such as ARC-Easy openbookqa social_i_qa ARC-Challenge winogrande piqa boolq hellaswag) or other datasets?

@mikecovlee
Copy link
Contributor

You can use evaluator provided by MoE-PEFT.

@Jinstorm
Copy link
Author

Hi, I also meet some errors when trying to reproduce the result of eight multi-task results.

  1. I run the config gen command like: python ./launch.py gen --template mixlora --tasks "arc-c;arc-e;boolq;obqa;piqa;siqa;hellaswag;winogrande" --multi_task True --adapter_name mixlora --num_epochs 3 --batch_size 4 --micro_batch_size 1 --learning_rate 3e-4 --cutoff_len 512 before start training.
    Is this due to any error of my env?
python ./launch.py run --base_model /nfsdat/home/bzzhangslm/model/LLM-Research/Meta-Llama-3___1-8B-Instruct --config moe_peft.json
[2025-02-27 21:21:04,031] MoE-PEFT: NVIDIA CUDA initialized successfully.
[2025-02-27 21:21:04,035] MoE-PEFT: Initializing pre-trained model.
[2025-02-27 21:21:04,035] MoE-PEFT: Loading model with half precision.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00,  2.29s/it]
[2025-02-27 21:21:13,637] MoE-PEFT: Use eager as attention implementation.
[2025-02-27 21:21:13,932] MoE-PEFT: Detecting <pad> is None, setting to <eos> by default.
[2025-02-27 21:21:14,180] MoE-PEFT: Using efficient operators.
[2025-02-27 21:21:14,182] MoE-PEFT: mixlora_0 total trainable params: 241172480
[2025-02-27 21:21:14,183] MoE-PEFT: mixlora_0 total trainable params (except gates): 240123904
[2025-02-27 21:21:15,165] MoE-PEFT: Preparing data for 7 tasks
[2025-02-27 21:21:23,069] MoE-PEFT: Preparing data for ARC-Challenge
[2025-02-27 21:21:28,187] MoE-PEFT: Preparing data for ARC-Easy
[2025-02-27 21:21:33,871] MoE-PEFT: Preparing data for BoolQ
[2025-02-27 21:21:41,593] MoE-PEFT: Preparing data for OpenBookQA
README.md: 6.81kB [00:00, 14.6MB/s]                                                                                                                                                    
Traceback (most recent call last):
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft.py", line 291, in <module>
    moe_peft.train(
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/trainer.py", line 312, in train
    input_args = dispatcher.get_train_data()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/dispatcher.py", line 290, in get_train_data
    self.__dispatch_task_in()
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/dispatcher.py", line 271, in __dispatch_task_in
    task.load_data()
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/dispatcher.py", line 85, in load_data
    self.train_token_data_ = self.dataload_function_(self.tokenizer_)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/trainer.py", line 79, in _dataload_fn
    data = self.task_.loading_data(True, self.data_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/tasks/common.py", line 199, in loading_data
    data.extend(task.loading_data(is_train, None if len(path) == 0 else path))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/tasks/qa_tasks.py", line 158, in loading_data
    data = hf_datasets.load_dataset(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/datasets/load.py", line 1886, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable
  1. How can I evaluate these 8 datasets to get the eval results? Based on evaluator provided by MoE-PEFT, maybe using:
python ./evaluator.py \
    --base_model /nfsdat/home/bzzhangslm/model/LLM-Research/Meta-Llama-3___1-8B-Instruct \
    --task_name arc-c \
    --data_path arc-c \
    --lora_weights ./casual_0 \
    --load_16bit True \
    --save_file ./saved/eval/eval.json

Are the task_name and data_path settings right?
Thanks!

@mikecovlee
Copy link
Contributor

It seems like your environment issues. The last error was triggered by datasets package which developed by HuggingFace.

@Jinstorm
Copy link
Author

Thanks! I have solved the env problem. But I still have questions about the dataset loading and the evaluation after training.
(1) how to load datasets other than (arc-c;arc-e;boolq;obqa;piqa;siqa;hellaswag), such as some glue datasets or math datasets, can I load these by pointing the path of train.json file? Or do I need to modify some modules at MoE-PEFT/moe_peft/tasks?

(2) still cannot figure out how to eval (including the existing 8 datasets and maybe other glue/math data). Could you please provide some example commands?
Thanks a lot!!

@mikecovlee
Copy link
Contributor

  1. Currently, you can load data by two approaches: a) adding a new data loader by modify modules at MoE-PEFT/moe_peft/tasks; b) loading data to casual task, you can refer to dummy_data.json. But this way you can't support automatically evaluation.
  2. If you are using build-in tasks, evaluation is performed automatically. See launch.py.

@Jinstorm
Copy link
Author

Jinstorm commented Mar 1, 2025

Thanks for your reply! I started to try to train on GLUE ( also built in MoE-PEFT/moe_peft/tasks), but ran into errors when I tried multi-task training like:

python ./launch.py gen --template mixlora --tasks "glue:cola;glue:mrpc;glue:rte" --multi_task True --adapter_name mixlora --num_epochs 3 --batch_size 4 --micro_batch_size 1 --learning_rate 3e-4 --cutoff_len 512
python ./launch.py run --base_model model/LLM-Research/Meta-Llama-3___1-8B-Instruct
[2025-03-01 21:11:30,312] MoE-PEFT: Encode text data: 0/14709
[2025-03-01 21:11:30,789] MoE-PEFT: Encode text data: 10000/14709
[2025-03-01 21:11:31,820] MoE-PEFT: Max train tokens length: 290/512
[2025-03-01 21:11:31,833] MoE-PEFT: Loading training task mixlora_0
[2025-03-01 21:11:31,834] MoE-PEFT: mixlora_0 train data:
[2025-03-01 21:11:31,834] MoE-PEFT:     epoch: 1/3             step in epoch: 0/14709
Traceback (most recent call last):
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft.py", line 293, in <module>
    moe_peft.train(
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/trainer.py", line 314, in train
    outputs = model.forward(input_args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/model.py", line 497, in forward
    output_data.loss = output_data.loss_fn_(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/llm/MoS/MoE-PEFT/moe_peft/model.py", line 67, in loss
    return loss_fn(
           ^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/torch/nn/modules/loss.py", line 1293, in forward
    return F.cross_entropy(
           ^^^^^^^^^^^^^^^^
  File "/nfsdat/home/bzzhangslm/miniconda3/envs/moe_peft/lib/python3.12/site-packages/torch/nn/functional.py", line 3479, in cross_entropy
    return torch._C._nn.cross_entropy_loss(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Expected input batch_size (7) to match target batch_size (0).

Did I get anything wrong?

@mikecovlee
Copy link
Contributor

GLUE tasks currently can't run with multi-task mode :)
You can check our codes, apparently GLUE tasks are implemented with separately classification head instead of original lm_head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants