-
Notifications
You must be signed in to change notification settings - Fork 248
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Ran NeMo-RL DPO on Nemotron-3-Nano-30B BF16 with custom {prompt, chosen, rejected} dataset and consistently hit OOM during DTensorPolicyWorker init. Set-up: Single node, 4×A100-80GB (Brev), TP=4, CPU offload, activation checkpointing, long context (~3.2-3.4k tokens). Open questions: Is 4×80GB expected to be insufficient for this recipe? Any known working DPO config for Nemotron-3-Nano-30B? Is DPO + LoRA supported?
Expected behavior
Provide working DPO configuration for Nemotron-3-Nano-30B. Document memory requirements and LoRA+DPO support.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working