[QUESTION] How to convert a checkpoint to virtual pipeline format checkpoint #1249

ZacWang · 2024-10-11T10:33:02Z

ZacWang
Oct 11, 2024

Your question
I'm using tools/checkpoint/convert.py to convert a llama model to mcore model format for training. The tools/checkpoint/loader_mcore.py support virtual pipeline model loading, but tools/checkpoint/saver_mcore.py doesn't support to save a virtual pipeline model.

Do I have any other way to do this convert? Or do I need to modify saver_mcore.py to support this? Maybe with support for args like target_num_layers_per_virtual_pipeline_stage and target_virtual_pipeline_model_parallel_size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How to convert a checkpoint to virtual pipeline format checkpoint #1249

{{title}}

Replies: 0 comments

Select a reply

[QUESTION] How to convert a checkpoint to virtual pipeline format checkpoint #1249

ZacWang Oct 11, 2024

Replies: 0 comments

ZacWang
Oct 11, 2024