Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw messages in text-generation task with deepseek r1 with PEFTModel #36783

Open
2 of 4 tasks
falconlee236 opened this issue Mar 18, 2025 · 9 comments · May be fixed by #36887
Open
2 of 4 tasks

Throw messages in text-generation task with deepseek r1 with PEFTModel #36783

falconlee236 opened this issue Mar 18, 2025 · 9 comments · May be fixed by #36887
Labels

Comments

@falconlee236
Copy link

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.49.0
  • Platform: Linux-5.15.0-134-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.29.3
  • Safetensors version: 0.5.3
  • Accelerate version: 1.3.0
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: DEEPSPEED
    - use_cpu: False
    - debug: False
    - num_processes: 1
    - machine_rank: 0
    - num_machines: 0
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - deepspeed_config: {'deepspeed_config_file': '/opt/config/train_config.json', 'zero3_init_flag': True}
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
  • DeepSpeed version: 0.16.4
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: No
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

@ArthurZucker @Rocketknight1 @muellerzr

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch

from transformers import pipeline, AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import PeftModel

ADAPTER_PATH = "./output/adapter/mnc_adapter"
BASE_PATH = "./output/model"
BNB_CONFG = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


# input
text = "Who is a Elon Musk?"

model = AutoModelForCausalLM.from_pretrained(
    BASE_PATH,
    quantization_config=BNB_CONFG,
    torch_dtype=torch.float16,
    device_map = 'auto',
)
tokenizer = AutoTokenizer.from_pretrained(BASE_PATH)
lora_model = PeftModel.from_pretrained(
    model,
    ADAPTER_PATH,
    quantization_config=BNB_CONFG,
    torch_dtype=torch.float16,
    device_map = 'auto',
)

default_generator = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    torch_dtype=torch.float16
)
print(f"this is base model result: {default_generator(text)}")

lora_generator = pipeline(
    task="text-generation",
    model=lora_model,
    tokenizer=tokenizer,
    device_map="auto",
    torch_dtype=torch.float16
)
print(f"this is lora model result: {lora_generator(text)}")
  1. execute lora_generator(text)
  2. output warning messages with followings
  3. With my debugging, transformers/pipelines/base.py that section was problems
def check_model_type(self, supported_models: Union[List[str], dict]):
        """
        Check if the model class is in supported by the pipeline.

        Args:
            supported_models (`List[str]` or `dict`):
                The list of models supported by the pipeline, or a dictionary with model class values.
        """
        if not isinstance(supported_models, list):  # Create from a model mapping
            supported_models_names = []
            for _, model_name in supported_models.items():
                # Mapping can now contain tuples of models for the same configuration.
                if isinstance(model_name, tuple):
                    supported_models_names.extend(list(model_name))
                else:
                    supported_models_names.append(model_name)
            if hasattr(supported_models, "_model_mapping"):
                for _, model in supported_models._model_mapping._extra_content.items():
                    if isinstance(model_name, tuple):
                        supported_models_names.extend([m.__name__ for m in model])
                    else:
                        supported_models_names.append(model.__name__)
            supported_models = supported_models_names
        if self.model.__class__.__name__ not in supported_models:
            logger.error(
                f"The model '{self.model.__class__.__name__}' is not supported for {self.task}. Supported models are"
                f" {supported_models}."
            )

Expected behavior

without unsupported models message.

This error might be occured the deepseek model was not in supported_models List

  • The pipeline was successfully worked, but I wanna remove this annoying message
python hug_inference.py 
/root/workspace/lora_test/.venv/lib/python3.10/site-packages/transformers/quantizers/auto.py:206: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
  warnings.warn(warning_msg)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:07<00:00,  1.12it/s]
Device set to use cuda:0
/root/workspace/lora_test/.venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:451: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.
  warnings.warn(
this is base model result: [{'generated_text': "Who is a Elon Musk? Well, he's a business magnate, investor, and entrepreneur. He's known for his ambitious"}]
Device set to use cuda:0
The model 'PeftModel' is not supported for text-generation. Supported models are ['AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DiffLlamaForCausalLM', 'ElectraForCausalLM', 'Emu3ForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GlmForCausalLM', 'GotOcr2ForConditionalGeneration', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'GraniteMoeSharedForCausalLM', 'HeliumForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'Mamba2ForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MllamaForCausalLM', 'MoshiForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhimoeForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RecurrentGemmaForCausalLM', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'WhisperForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM', 'ZambaForCausalLM', 'Zamba2ForCausalLM'].
this is lora model result: [{'generated_text': "Who is a Elon Musk? I mean, I know he's a business magnate or something, but what has he actually done"}]
@Rocketknight1
Copy link
Member

cc @sayakpaul @BenjaminBossan for PEFT - if you think this is an issue in pipelines instead, let me know and I'll try to update our class matching logic!

@BenjaminBossan
Copy link
Member

I'm not very familiar with pipelines but this is what I gather: I think we should check if peft is installed, and if it is, it should be added to the supported_models list. However, I'm not sure why self.model.__class__.__name__ not in supported_models is checked instead of isinstance, maybe to avoid imports? The issue with that is that we have many PeftModel subclasses, so the list would need to be extended by:

"PeftModel", "PeftModelForSequenceClassification", "PeftModelForCausalLM", "PeftModelForSeq2SeqLM", "PeftModelForTokenClassification", "PeftModelForQuestionAnswering", "PeftModelForFeatureExtraction"

@Rocketknight1
Copy link
Member

@BenjaminBossan that makes sense! @falconlee236 would you be willing to attempt a PR for that?

@sambhavnoobcoder
Copy link
Contributor

Hi @Rocketknight1 , i found this to be an interesting issue , and raised a PR fixing the same in #36868 . please have a look at it , i'll make any changes to it if required as soon as possible . thank you @falconlee236 for raising this issue .

@falconlee236
Copy link
Author

falconlee236 commented Mar 20, 2025

Hi @Rocketknight1 , i found this to be an interesting issue , and raised a PR fixing the same in #36868 . please have a look at it , i'll make any changes to it if required as soon as possible . thank you @falconlee236 for raising this issue .

I tried to resolve the issue first, but @sambhavnoobcoder resolved it before me, so I don't feel great about it. At the very least, I wish you had submitted the PR after hearing my answer

I want to be willing to attempt a PR
@Rocketknight1

@sambhavnoobcoder
Copy link
Contributor

so sorry @falconlee236 , that was not my intention in any way . please submit your pr , my curiosity just got the best of me . Please ignore my attempt and go ahead with your implementation . apologies for any inconvenience again .

@Rocketknight1
Copy link
Member

I'm happy for anyone to make the PR as long as it gets fixed! We generally don't "assign" issues to specific people - there's more than enough work to be done in the library

@falconlee236
Copy link
Author

I'm happy for anyone to make the PR as long as it gets fixed! We generally don't "assign" issues to specific people - there's more than enough work to be done in the library

I think I said that because I also want to contribute to Transformers. I'm sorry if it made you feel bad. @sambhavnoobcoder

@sambhavnoobcoder
Copy link
Contributor

cool . in that case , i have reopened the PR and would appreciate your review @Rocketknight1 on the same . Also no worries @falconlee236 , i understand you also want to contribute to Transformers , and it would be my pleasure to contribute alongside you . i would also appreciate to learn more from your PR as well .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants