[QUESTION] LLaVA model_type, pipeline parallel training #1078

KookHoiKim · 2024-08-28T00:58:52Z

KookHoiKim
Aug 28, 2024

I'm trying to apply pipeline parallelism in training LLaVA. (TP1, PP2)
Although I followed the instruction , the code is not working. (TP2 PP1 working)

And i found there is some weird points in the code.

LLaVA is basically decoder-only model.

In my understanding, vision encoder / vision projector is additional embedding part, which is only used in pre_process part.
However, LLaVA model is initialized as encoder_and_decoder model_type. Why not encoder_or_decoder?

Furthermore, while pp communication, the recv, send tensor shape is set as (num_image_token, B, hidden_size).
It seems shards gives/takes vision embedding, not the intermediate states from middle of language model.

P.S. Currently, i do not use encoder_pipeline_model_parallel_size / tensor parallel size because it occurs errors while initializing megatron that is not divisible world_size % total_model_size .
So i forced vision_config.pipeline_model_parallel_size to be 1.

I am not familiar with megatron code, and really hope that get some help with llava training.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] LLaVA model_type, pipeline parallel training #1078

{{title}}

Replies: 0 comments

Select a reply

[QUESTION] LLaVA model_type, pipeline parallel training #1078

KookHoiKim Aug 28, 2024

Replies: 0 comments

KookHoiKim
Aug 28, 2024