Problems on generating with llama model #921

wiio12 · 2023-05-04T13:21:21Z

Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints.

Error in loading the llama checkpoint. The converted checkpoint generated by the script tools/convert_raw_llama_weights_to_neox.py provides no optimizer states. but the deepspeed keep trying to load the optimizer state even if I set the finetune flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not.
Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where

attention_mask = attention_mask[
                    ..., : attention_scores.size(3), : attention_scores.size(3)
                ]

is change to

attention_mask = attention_mask[
                    ..., : attention_scores.size(2), : attention_scores.size(3)
                ]

I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it.

The text was updated successfully, but these errors were encountered:

wiio12 · 2023-05-04T13:49:04Z

The error mentioned here also occurred if I set the self.pipe_parallel_size >= 1, so I have the stay with self.pipe_parallel_size >= 2

DaoD · 2023-05-04T14:45:33Z

Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints.

Error in loading the llama checkpoint. The converted checkpoint generated by the script tools/convert_raw_llama_weights_to_neox.py provides no optimizer states. but the deepspeed keep trying to load the optimizer state even if I set the finetune flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not.

Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where
attention_mask = attention_mask[
                    ..., : attention_scores.size(3), : attention_scores.size(3)
                ]
is change to
attention_mask = attention_mask[
                    ..., : attention_scores.size(2), : attention_scores.size(3)
                ]
I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it.

hi, I set the default value of load_optimizer_states as False in the load_checkpoint function of engine.py.
Then, I get the error IndexError: list index out of range in current_rank_sd = state_dict_list[dp_rank]
Have you met this problem?
How to fix the problem you mentioned in 1?
Can you fine-tune it now?

wiio12 · 2023-05-04T14:59:13Z

This is the same error I encounter in 1. I paste my solution here tomorrow :) Currently, I have not tried to finetune the model. I find that the loaded model fails to generate a reasonable sentence. I'm afraid the model is not correctly loaded...

…

---------- 该邮件从移动设备发送

--------------原始邮件-------------- 发件人："Yutao Zhu ***@***.***>; 发送时间：2023年5月4日(星期四) 晚上10:45 收件人："EleutherAI/gpt-neox" ***@***.***>; 抄送："王海明 ***@***.***>;"Author ***@***.***>; 主题：Re: [EleutherAI/gpt-neox] Problems on generating with llama model (Issue #921) ----------------------------------- Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints. Error in loading the llama checkpoint. The converted checkpoint generated by the script tools/convert_raw_llama_weights_to_neox.py provides no optimizer states. but the deepspeed keep trying to load the optimizer state even if I set the finetune flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not. Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where attention_mask = attention_mask[ ..., : attention_scores.size(3), : attention_scores.size(3) ] is change to attention_mask = attention_mask[ ..., : attention_scores.size(2), : attention_scores.size(3) ] I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it. hi, I set the default value of load_optimizer_states as False in the load_checkpoint function of engine.py. Then, I get the error IndexError: list index out of range in current_rank_sd = state_dict_list[dp_rank] Have you met this problem? How to fix the problem you mentioned in 1? Can you fine-tune it now? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

DaoD · 2023-05-04T15:01:20Z

Thx. I'm trying to continue training the model to see if the loss is correct. I will update my results here if it runs successfully.

wiio12 · 2023-05-05T01:48:47Z

Hi @DaoD, specifically, I make the following changes in problem 1:

    def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode):
        mp_rank = 0 if self.mpu is None else self.mpu.get_model_parallel_rank()
        zero_ckpt_names = self._get_mp_rank_zero_checkpoint_names(
            load_dir=load_dir,
            tag=tag,
            mp_rank=mp_rank,
            dp_world_size=self.loaded_checkpoint_dp_world_size,
            bf16_mode=bf16_mode)
        for i, ckpt_name in enumerate(zero_ckpt_names):
            if not os.path.exists(ckpt_name):
                # transparently handle the old file pattern for optim_states
                if "optim_states.pt" in ckpt_name:
                    ckpt_name_try = ckpt_name.replace("_optim_states.pt",
                                                      "optim_states.pt")
                    if os.path.exists(ckpt_name_try):
                        zero_ckpt_names[i] = ckpt_name_try
                        continue

        for ckpt_name in zero_ckpt_names:
            if not os.path.exists(ckpt_name):
                return None

        return zero_ckpt_names

The function is defined in here

DaoD · 2023-05-05T01:55:16Z

Hi @DaoD, specifically, I make the following changes in problem 1:

    def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode):
        mp_rank = 0 if self.mpu is None else self.mpu.get_model_parallel_rank()
        zero_ckpt_names = self._get_mp_rank_zero_checkpoint_names(
            load_dir=load_dir,
            tag=tag,
            mp_rank=mp_rank,
            dp_world_size=self.loaded_checkpoint_dp_world_size,
            bf16_mode=bf16_mode)
        for i, ckpt_name in enumerate(zero_ckpt_names):
            if not os.path.exists(ckpt_name):
                # transparently handle the old file pattern for optim_states
                if "optim_states.pt" in ckpt_name:
                    ckpt_name_try = ckpt_name.replace("_optim_states.pt",
                                                      "optim_states.pt")
                    if os.path.exists(ckpt_name_try):
                        zero_ckpt_names[i] = ckpt_name_try
                        continue

        for ckpt_name in zero_ckpt_names:
            if not os.path.exists(ckpt_name):
                return None

        return zero_ckpt_names

The function is defined in here

Thx! I have continued training the model. I find the loss can be reduced successfully, but I'm not sure if it can really work. I will test it soon.

StellaAthena · 2023-05-05T02:45:48Z

Apologies for the issues y'all're having. We tested this at pretraining before merging but not for finetuning or inference which in retrospect is somewhat silly. Thank you for your help in debugging this.

wiio12 · 2023-05-05T03:54:49Z

Hi @DaoD, specifically, I make the following changes in problem 1:

    def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode):
        mp_rank = 0 if self.mpu is None else self.mpu.get_model_parallel_rank()
        zero_ckpt_names = self._get_mp_rank_zero_checkpoint_names(
            load_dir=load_dir,
            tag=tag,
            mp_rank=mp_rank,
            dp_world_size=self.loaded_checkpoint_dp_world_size,
            bf16_mode=bf16_mode)
        for i, ckpt_name in enumerate(zero_ckpt_names):
            if not os.path.exists(ckpt_name):
                # transparently handle the old file pattern for optim_states
                if "optim_states.pt" in ckpt_name:
                    ckpt_name_try = ckpt_name.replace("_optim_states.pt",
                                                      "optim_states.pt")
                    if os.path.exists(ckpt_name_try):
                        zero_ckpt_names[i] = ckpt_name_try
                        continue

        for ckpt_name in zero_ckpt_names:
            if not os.path.exists(ckpt_name):
                return None

        return zero_ckpt_names

The function is defined in here

Thx! I have continued training the model. I find the loss can be reduced successfully, but I'm not sure if it can really work. I will test it soon.

Have you tried to generate with the loaded checkpoint? I found out that the generated text with greedy decoding tends to repeat itself.
For example:
This is the output of greedy decoding:

Input: Tell me about alpacas.
Output: \n the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the

This is the output of sampling with temperature=1:

Input: Tell me about alpacas.
Output: Whatersung\n Data you\n how youib them us how the you it I the To my In youral And youdown ifing your Youinas your them to mety it story that the the itoray U the k your D him youstein me whenisonaw you me me much Startor if youcom You You\n You this Your Breas at You it\nha what the your them me you\n a iten us Your You Hol to\nly K\nur me What your

@DaoD Could you kindly provide your config file and the running command for fine-tuning? Thx!

DaoD · 2023-05-05T12:32:01Z

For config, you can just copy the configs/llama/7B.yml into the model settings part of configs/6-7B.yml. The running command is the same as training other models.

yoosan · 2023-05-08T02:51:01Z

Hi @wiio12 I don't see a LlamaTokenizer in the code. How do you perform inference and verify the results?

wiio12 · 2023-05-08T02:58:02Z

Hi @yoosan, I use the SMPTokenizer, with the vocab file tokenizer.model. But a problem exists in this tokenizer reported here

DaoD · 2023-05-08T03:08:19Z

Hi @wiio12 I don't see a LlamaTokenizer in the code. How do you perform inference and verify the results?

I replace the SMPTokenizer by LlamaTokenizer by myself.

yoosan · 2023-05-08T06:30:39Z

That's great. Have you now aligned the results of the HF version and GPT-Neo inference to be consistent?

PS. LLaMA itself doesn't have the ability to engage in conversations, so it's better to verify the results using a continuation task, such as providing a prefix and letting it generate the rest. e.g.
Input: Charles was born in Buckingham Palace during the reign of

DaoD · 2023-05-08T06:36:56Z

I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well.

yoosan · 2023-05-08T07:24:05Z

Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation.

I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well.

DaoD · 2023-05-08T08:02:56Z

Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation.

I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well.

I just use the neox format for training/fine-tuning. After training, I convert it into HF version for inference/testing.

StellaAthena · 2023-05-08T13:57:09Z

So it sounds like this issue is a combination of two other issues:

Our recurring issues with generation in GPT-NeoX
The fact that we don’t currently support the SPM Tokenizer.

If that’s the case, I think it probably makes sense to close this issue as both of those are known problems we are currently working on.

Quentin-Anthony · 2023-05-15T15:11:08Z

So it sounds like this issue is a combination of two other issues:

Our recurring issues with generation in GPT-NeoX

The fact that we don’t currently support the SPM Tokenizer.

If that’s the case, I think it probably makes sense to close this issue as both of those are known problems we are currently working on.

Keep this issue open until these issues are resolved. We'll add a "Fixes xxx" clause that auto-closes this issue to whatever PR fixes things.

yoosan · 2023-05-22T06:39:59Z

Hi @StellaAthena Recently, has there been any progress? I have updated to the latest commit and found that there are no relevant code updates yet. I am really looking forward to a usable version.

sxthunder · 2023-09-11T03:56:51Z

Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation.

I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well.

I just use the neox format for training/fine-tuning. After training, I convert it into HF version for inference/testing.

Hello，can you share your script to convert neox-ckpt to hf-llama model?

ghost · 2024-03-04T02:03:11Z

Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation.

I do not use the inference code of GPT-NeoX, and it seems that there are some minor problems. I have just tested the HF version. It works well.

I just use the neox format for training/fine-tuning. After training, I convert it into HF version for inference/testing.

Hello，can you share your script to convert neox-ckpt to hf-llama model?

Hello, there is a new conversion script, you can go to /tools/ckpts to check it out

wiio12 mentioned this issue May 4, 2023

Support for LLaMA #841

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems on generating with llama model #921

Problems on generating with llama model #921

wiio12 commented May 4, 2023

wiio12 commented May 4, 2023 •

edited

Loading

DaoD commented May 4, 2023

wiio12 commented May 4, 2023 via email •

edited

Loading

DaoD commented May 4, 2023

wiio12 commented May 5, 2023

DaoD commented May 5, 2023

StellaAthena commented May 5, 2023

wiio12 commented May 5, 2023 •

edited

Loading

DaoD commented May 5, 2023

yoosan commented May 8, 2023

wiio12 commented May 8, 2023 •

edited

Loading

DaoD commented May 8, 2023

yoosan commented May 8, 2023

DaoD commented May 8, 2023

yoosan commented May 8, 2023

DaoD commented May 8, 2023

StellaAthena commented May 8, 2023

Quentin-Anthony commented May 15, 2023 •

edited

Loading

yoosan commented May 22, 2023

sxthunder commented Sep 11, 2023

ghost commented Mar 4, 2024

Problems on generating with llama model #921

Problems on generating with llama model #921

Comments

wiio12 commented May 4, 2023

wiio12 commented May 4, 2023 • edited Loading

DaoD commented May 4, 2023

wiio12 commented May 4, 2023 via email • edited Loading

DaoD commented May 4, 2023

wiio12 commented May 5, 2023

DaoD commented May 5, 2023

StellaAthena commented May 5, 2023

wiio12 commented May 5, 2023 • edited Loading

DaoD commented May 5, 2023

yoosan commented May 8, 2023

wiio12 commented May 8, 2023 • edited Loading

DaoD commented May 8, 2023

yoosan commented May 8, 2023

DaoD commented May 8, 2023

yoosan commented May 8, 2023

DaoD commented May 8, 2023

StellaAthena commented May 8, 2023

Quentin-Anthony commented May 15, 2023 • edited Loading

yoosan commented May 22, 2023

sxthunder commented Sep 11, 2023

ghost commented Mar 4, 2024

wiio12 commented May 4, 2023 •

edited

Loading

wiio12 commented May 4, 2023 via email •

edited

Loading

wiio12 commented May 5, 2023 •

edited

Loading

wiio12 commented May 8, 2023 •

edited

Loading

Quentin-Anthony commented May 15, 2023 •

edited

Loading