Skip to content

Error when loading the LLM checkpoint shards #11

@ShengenWu

Description

@ShengenWu

When trying to load the pretrained model using eval scripts, we encountered an error during checkpoint loading. The process fails at 67% all the time, and the traceback suggests an issue with loading the final one of the shard files. We have tried re-download the checkpoint file on another server but the error is still exist.

Error message

Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████████████████████████ | 2/3 [00:10<00:05,  5.13s/it]
Traceback (most recent call last):
  File "xxxxx/GeoX/eval/inference.py", line 101, in <module>
    main(args)
  File "xxxxx/GeoX/eval/inference.py", line 26, in main
    tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, device_map="cuda")
  File "xxxxx/GeoX/utils/developer.py", line 32, in load_pretrained_model
    model = GeoXLlamaForCausalLM.from_pretrained(
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
    ) = cls._load_pretrained_model(
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax"]:
AttributeError: 'NoneType' object has no attribute 'get'

Steps to reproduce the behavior:

  1. Using the command bash scripts/eval_geoqa_top1.sh (We have tried geoqa/geometry3k/pgps9k eval scripts)
  2. Observe the loading progress until ~67% for every checkpoint files.
  3. The program crashes with the above traceback.

Environment:
As described in readme.md.

Additional context
It seems the metadata for one of the checkpoint shards might be None. Possibly a corrupted or incomplete file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions