Skip to content

New options for preference tuning: rpo alpha, logprobs normalization, reference-free, simpo gamma #491

New options for preference tuning: rpo alpha, logprobs normalization, reference-free, simpo gamma

New options for preference tuning: rpo alpha, logprobs normalization, reference-free, simpo gamma #491