-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems on generating with llama model #921
Comments
The error mentioned here also occurred if I set the |
hi, I set the default value of load_optimizer_states as False in the load_checkpoint function of engine.py. |
This is the same error I encounter in 1. I paste my solution here tomorrow :)
Currently, I have not tried to finetune the model. I find that the loaded model fails to generate a reasonable sentence. I'm afraid the model is not correctly loaded...
…----------
该邮件从移动设备发送
--------------原始邮件--------------
发件人:"Yutao Zhu ***@***.***>;
发送时间:2023年5月4日(星期四) 晚上10:45
收件人:"EleutherAI/gpt-neox" ***@***.***>;
抄送:"王海明 ***@***.***>;"Author ***@***.***>;
主题:Re: [EleutherAI/gpt-neox] Problems on generating with llama model (Issue #921)
-----------------------------------
Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints.
Error in loading the llama checkpoint. The converted checkpoint generated by the script tools/convert_raw_llama_weights_to_neox.py provides no optimizer states. but the deepspeed keep trying to load the optimizer state even if I set the finetune flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not.
Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where
attention_mask = attention_mask[ ..., : attention_scores.size(3), : attention_scores.size(3) ]
is change to
attention_mask = attention_mask[ ..., : attention_scores.size(2), : attention_scores.size(3) ]
I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it.
hi, I set the default value of load_optimizer_states as False in the load_checkpoint function of engine.py.
Then, I get the error IndexError: list index out of range in current_rank_sd = state_dict_list[dp_rank]
Have you met this problem?
How to fix the problem you mentioned in 1?
Can you fine-tune it now?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Thx. I'm trying to continue training the model to see if the loss is correct. I will update my results here if it runs successfully. |
Hi @DaoD, specifically, I make the following changes in problem 1: def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode):
mp_rank = 0 if self.mpu is None else self.mpu.get_model_parallel_rank()
zero_ckpt_names = self._get_mp_rank_zero_checkpoint_names(
load_dir=load_dir,
tag=tag,
mp_rank=mp_rank,
dp_world_size=self.loaded_checkpoint_dp_world_size,
bf16_mode=bf16_mode)
for i, ckpt_name in enumerate(zero_ckpt_names):
if not os.path.exists(ckpt_name):
# transparently handle the old file pattern for optim_states
if "optim_states.pt" in ckpt_name:
ckpt_name_try = ckpt_name.replace("_optim_states.pt",
"optim_states.pt")
if os.path.exists(ckpt_name_try):
zero_ckpt_names[i] = ckpt_name_try
continue
for ckpt_name in zero_ckpt_names:
if not os.path.exists(ckpt_name):
return None
return zero_ckpt_names The function is defined in here |
Thx! I have continued training the model. I find the loss can be reduced successfully, but I'm not sure if it can really work. I will test it soon. |
Apologies for the issues y'all're having. We tested this at pretraining before merging but not for finetuning or inference which in retrospect is somewhat silly. Thank you for your help in debugging this. |
Have you tried to generate with the loaded checkpoint? I found out that the generated text with greedy decoding tends to repeat itself.
This is the output of sampling with temperature=1:
@DaoD Could you kindly provide your config file and the running command for fine-tuning? Thx! |
For config, you can just copy the configs/llama/7B.yml into the model settings part of configs/6-7B.yml. The running command is the same as training other models. |
Hi @wiio12 I don't see a LlamaTokenizer in the code. How do you perform inference and verify the results? |
I replace the SMPTokenizer by LlamaTokenizer by myself. |
That's great. Have you now aligned the results of the HF version and GPT-Neo inference to be consistent? PS. LLaMA itself doesn't have the ability to engage in conversations, so it's better to verify the results using a continuation task, such as providing a prefix and letting it generate the rest. e.g. |
I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well. |
Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation.
|
I just use the neox format for training/fine-tuning. After training, I convert it into HF version for inference/testing. |
So it sounds like this issue is a combination of two other issues:
If that’s the case, I think it probably makes sense to close this issue as both of those are known problems we are currently working on. |
Keep this issue open until these issues are resolved. We'll add a "Fixes xxx" clause that auto-closes this issue to whatever PR fixes things. |
Hi @StellaAthena Recently, has there been any progress? I have updated to the latest commit and found that there are no relevant code updates yet. I am really looking forward to a usable version. |
Hello,can you share your script to convert neox-ckpt to hf-llama model? |
Hello, there is a new conversion script, you can go to /tools/ckpts to check it out |
Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints.
Error in loading the llama checkpoint. The converted checkpoint generated by the script
tools/convert_raw_llama_weights_to_neox.py
provides no optimizer states. but thedeepspeed
keep trying to load the optimizer state even if I set thefinetune
flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not.Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where
is change to
I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it.
The text was updated successfully, but these errors were encountered: