Question about how Dataset.weight controls data mixing ratio in SFT preprocessing

Hello,

I'm trying to reproduce the SFT training described in the [paper](https://arxiv.org/pdf/2512.20848) using the [nano3/stage1_sft recipe](https://github.com/NVIDIA-NeMo/Nemotron/blob/dev/src/nemotron/recipes/nano3/stage1_sft/data_prep.py). During data preparation, I need to mix multiple datasets and set different weight values for each dataset in the data blend, hoping to control their proportions in the final training data.

Could you please clarify:

Does the current data preprocessing pipeline (data_prep.py) actually use weight to perform weighted data mixing? Or is weight only passed as metadata to the downstream training framework?

If weighted mixing is supported, what is the correct configuration? Is there any example available?

If it's not supported in the current version, what is the recommended way to mix multiple datasets? (e.g., should it be handled dynamically by Megatron-Bridge during training?)

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about how Dataset.weight controls data mixing ratio in SFT preprocessing #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about how Dataset.weight controls data mixing ratio in SFT preprocessing #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions