Hello,
I'm trying to reproduce the SFT training described in the paper using the nano3/stage1_sft recipe. During data preparation, I need to mix multiple datasets and set different weight values for each dataset in the data blend, hoping to control their proportions in the final training data.
Could you please clarify:
Does the current data preprocessing pipeline (data_prep.py) actually use weight to perform weighted data mixing? Or is weight only passed as metadata to the downstream training framework?
If weighted mixing is supported, what is the correct configuration? Is there any example available?
If it's not supported in the current version, what is the recommended way to mix multiple datasets? (e.g., should it be handled dynamically by Megatron-Bridge during training?)
Thank you!