Skip to content

fix(sde,flowdppo): derive KL sigma_t from each SDE strategy (flow/dance/cps)#38

Open
Jayce-Ping wants to merge 2 commits into
Tencent-Hunyuan:mainfrom
Jayce-Ping:fix/flowdppo-sigma-t
Open

fix(sde,flowdppo): derive KL sigma_t from each SDE strategy (flow/dance/cps)#38
Jayce-Ping wants to merge 2 commits into
Tencent-Hunyuan:mainfrom
Jayce-Ping:fix/flowdppo-sigma-t

Conversation

@Jayce-Ping

@Jayce-Ping Jayce-Ping commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

FlowDPPO._compute_sigma_t hardcoded the Flow-SDE std, so the KL normalizer was wrong for Dance and CPS (and, at the first sigma == 1 step, for Flow too). This adds a single source of truth on the SDE strategies -- _std_dev_t + transition_std -- and routes each strategy's step() through them (numerically identical). FlowDPPO._compute_sigma_t now delegates to strategy.transition_std, so the KL (d_mean)^2 / (2 * std^2) uses each strategy's actual transition std: Flow/Dance use std_dev_t * sqrt(-dt), CPS uses std_dev_t (no sqrt(-dt)). This also folds in the earlier fix to Flow's sigma == 1 denominator (sigma_max = sigmas[1] instead of a 0.99 clamp).

Related Issue

N/A

Test Plan

  • Pure-Python numeric check (no local torch env): the new transition_std equals each strategy's original step() std at representative sigmas including sigma == 1 for Flow; CPS omits sqrt(-dt) (e.g. 0.859 vs 0.162 at step 0); sigma < 1 Flow/Dance steps are unchanged.
  • Not run; reason: no local Python/torch environment (pre-commit, pytest, Hydra config validation, and training/rollout smoke tests were not run). The step() refactor is numerically identical (same formula relocated into _std_dev_t); a rollout smoke on GPU is recommended.

Compatibility / Risk

Low-to-moderate. Sampling/log_prob math is unchanged (each step()'s std_dev_t is the same formula moved into _std_dev_t). The behavior change is limited to the FlowDPPO KL normalizer, which is now correct per strategy (previously only Flow was handled, and Flow was wrong at sigma == 1). No config, checkpoint, data-format, or API changes. The new abstract SDEStrategy._std_dev_t is implemented by all SDE strategies (Flow/Dance/CPS); ODE/DPM2 is not an SDEStrategy and is unaffected.

Reviewer Notes

AI-assisted. Single-source-of-truth refactor so step() and the KL share one std definition. Scope is limited to sigma_t consistency; other FlowDPPO gaps (reference-KL, advantage clipping, EMA) remain out of scope. Supersedes the earlier sigma == 1-only commit on this branch (now folded in).

Checklist

  • I reviewed the changed code and removed unrelated/generated artifacts.
  • I updated tests, docs, and configs where needed, or explained why not (no test infra exists for this path; verified numerically).

Jayce-Ping and others added 2 commits June 12, 2026 10:07
FlowDPPO._compute_sigma_t used clamp(s, max=0.99) for the variance
denominator, which disagreed with FlowSDEStrategy.step's
where(sigma==1, sigmas[1], sigma). This underestimated the first
(sigma==1) step's KL by ~3.6x, so the highest-noise step was almost
never masked. Use the same sigma_max=sigmas[1] denominator so the
KL-normalization sigma_t equals the transition's std_var at every
step; sigma<1 steps are unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add SDEStrategy._std_dev_t + transition_std as the single source for the
per-step transition std, and route Flow/Dance/CPS step() through them
(numerically identical). FlowDPPO._compute_sigma_t now delegates to
strategy.transition_std, so the KL normalizer matches each strategy:
Flow/Dance use std_dev_t*sqrt(-dt); CPS uses std_dev_t (no sqrt(-dt)).
Subsumes the earlier sigma==1 Flow fix (now in FlowSDEStrategy._std_dev_t).

Co-authored-by: Cursor <cursoragent@cursor.com>
@Jayce-Ping Jayce-Ping changed the title fix(flowdppo): align KL sigma_t with SDE transition at sigma==1 fix(sde,flowdppo): derive KL sigma_t from each SDE strategy (flow/dance/cps) Jun 12, 2026
@Jayce-Ping Jayce-Ping requested a review from haonan3 June 12, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant