Skip to content

NeMo-RL DPO OOM issues with Nemotron-3-Nano-30B #1922

@slic33

Description

@slic33

Describe the bug

Ran NeMo-RL DPO on Nemotron-3-Nano-30B BF16 with custom {prompt, chosen, rejected} dataset and consistently hit OOM during DTensorPolicyWorker init. Set-up: Single node, 4×A100-80GB (Brev), TP=4, CPU offload, activation checkpointing, long context (~3.2-3.4k tokens). Open questions: Is 4×80GB expected to be insufficient for this recipe? Any known working DPO config for Nemotron-3-Nano-30B? Is DPO + LoRA supported?

Expected behavior

Provide working DPO configuration for Nemotron-3-Nano-30B. Document memory requirements and LoRA+DPO support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions