-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
When trying to load the pretrained model using eval scripts, we encountered an error during checkpoint loading. The process fails at 67% all the time, and the traceback suggests an issue with loading the final one of the shard files. We have tried re-download the checkpoint file on another server but the error is still exist.
Error message
Loading checkpoint shards: 67%|████████████████████████████████████████████████████████████████████████████████ | 2/3 [00:10<00:05, 5.13s/it]
Traceback (most recent call last):
File "xxxxx/GeoX/eval/inference.py", line 101, in <module>
main(args)
File "xxxxx/GeoX/eval/inference.py", line 26, in main
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, device_map="cuda")
File "xxxxx/GeoX/utils/developer.py", line 32, in load_pretrained_model
model = GeoXLlamaForCausalLM.from_pretrained(
File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
) = cls._load_pretrained_model(
File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
if metadata.get("format") not in ["pt", "tf", "flax"]:
AttributeError: 'NoneType' object has no attribute 'get'Steps to reproduce the behavior:
- Using the command
bash scripts/eval_geoqa_top1.sh(We have tried geoqa/geometry3k/pgps9k eval scripts) - Observe the loading progress until ~67% for every checkpoint files.
- The program crashes with the above traceback.
Environment:
As described in readme.md.
Additional context
It seems the metadata for one of the checkpoint shards might be None. Possibly a corrupted or incomplete file?
Metadata
Metadata
Assignees
Labels
No labels