fix(sde,flowdppo): derive KL sigma_t from each SDE strategy (flow/dance/cps)#38
Open
Jayce-Ping wants to merge 2 commits into
Open
fix(sde,flowdppo): derive KL sigma_t from each SDE strategy (flow/dance/cps)#38Jayce-Ping wants to merge 2 commits into
Jayce-Ping wants to merge 2 commits into
Conversation
FlowDPPO._compute_sigma_t used clamp(s, max=0.99) for the variance denominator, which disagreed with FlowSDEStrategy.step's where(sigma==1, sigmas[1], sigma). This underestimated the first (sigma==1) step's KL by ~3.6x, so the highest-noise step was almost never masked. Use the same sigma_max=sigmas[1] denominator so the KL-normalization sigma_t equals the transition's std_var at every step; sigma<1 steps are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
Add SDEStrategy._std_dev_t + transition_std as the single source for the per-step transition std, and route Flow/Dance/CPS step() through them (numerically identical). FlowDPPO._compute_sigma_t now delegates to strategy.transition_std, so the KL normalizer matches each strategy: Flow/Dance use std_dev_t*sqrt(-dt); CPS uses std_dev_t (no sqrt(-dt)). Subsumes the earlier sigma==1 Flow fix (now in FlowSDEStrategy._std_dev_t). Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FlowDPPO._compute_sigma_thardcoded the Flow-SDE std, so the KL normalizer was wrong for Dance and CPS (and, at the firstsigma == 1step, for Flow too). This adds a single source of truth on the SDE strategies --_std_dev_t+transition_std-- and routes each strategy'sstep()through them (numerically identical).FlowDPPO._compute_sigma_tnow delegates tostrategy.transition_std, so the KL(d_mean)^2 / (2 * std^2)uses each strategy's actual transition std: Flow/Dance usestd_dev_t * sqrt(-dt), CPS usesstd_dev_t(nosqrt(-dt)). This also folds in the earlier fix to Flow'ssigma == 1denominator (sigma_max = sigmas[1]instead of a0.99clamp).Related Issue
N/A
Test Plan
transition_stdequals each strategy's originalstep()std at representative sigmas includingsigma == 1for Flow; CPS omitssqrt(-dt)(e.g. 0.859 vs 0.162 at step 0);sigma < 1Flow/Dance steps are unchanged.pre-commit,pytest, Hydra config validation, and training/rollout smoke tests were not run). Thestep()refactor is numerically identical (same formula relocated into_std_dev_t); a rollout smoke on GPU is recommended.Compatibility / Risk
Low-to-moderate. Sampling/log_prob math is unchanged (each
step()'sstd_dev_tis the same formula moved into_std_dev_t). The behavior change is limited to the FlowDPPO KL normalizer, which is now correct per strategy (previously only Flow was handled, and Flow was wrong atsigma == 1). No config, checkpoint, data-format, or API changes. The new abstractSDEStrategy._std_dev_tis implemented by all SDE strategies (Flow/Dance/CPS); ODE/DPM2 is not anSDEStrategyand is unaffected.Reviewer Notes
AI-assisted. Single-source-of-truth refactor so
step()and the KL share one std definition. Scope is limited to sigma_t consistency; other FlowDPPO gaps (reference-KL, advantage clipping, EMA) remain out of scope. Supersedes the earliersigma == 1-only commit on this branch (now folded in).Checklist