-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Hi. I would like to ask if the packing + CP is already planned as it is not possible yet.
Is your feature request related to a problem? Please describe.
Packing is a way of how to test or fill data for long context training. It seems that long context training with limited number of nodes is not possible yet.
Describe the solution you'd like
It would be amazing to combine packing process defined here with CP to enable long context training.
Describe alternatives you've considered
I tried combinations without CP but was not successful. The MegatronPretraining under the dataset can create long-context samples but it seems not supported by the CP>1.
Additional context
Thank you very much! I did lots of experiments with NeMo2 and the separation into Automodel for pre-training and SFT is great idea.
Ondřej