[FSDP] Optimize context parallel #1383

Beichen-Ma · 2026-01-12T02:42:55Z

Motivation

Try to optimize context parallel for FSDP backend #1062 by eliminating the all_gather for entropy and log_probs. Instead of gathering log_probs and entropy across all CP ranks and then distributing them, each rank now computes and retains only its local portian, with handling during unpacking and loss computation.

Changes

get_logprob_and_entropy_with_cp change from all_gather full log_probs/entropy to return local shards only
unpack_sequences_with_cp handle tensor slicing, extracting only the portion of each sequence's response that overlaps with the current rank
all_reduce the final loss across cp ranks

Impact

During training, get_logprob_and_entropy_with_cp is called 3 times to obtain ref logprobs, actor logprobs and current logprobs. Three all_gather operations on full-lengths response log_probs+entropy replaced by 1 scalar all_reduce.

Concerns

The sum_of_sample_mean is now computed as a sum of local contributions.

…#1362) Co-authored-by: fy1214 <[email protected]> Co-authored-by: Gao016 <[email protected]> Co-authored-by: yefei12 <[email protected]> Co-authored-by: yzlnew <[email protected]>

yueming-yuan · 2026-01-12T11:12:46Z

(this might not be compatible with new FSDP impl...)

Beichen-Ma and others added 15 commits January 7, 2026 22:30

first draft

eeb3b20

backup WIP fix

0e4026d

fix slicing, filter valid shards

04af9b9

update

23f4e1f

Merge branch 'THUDM:main' into optimize-cp

5a16aa7

change to compute local loss

97a222c

clean

71fe606

update local response lengths

3bd5e96

[sync] sync internal bugfixes (THUDM#1371)

8295489

feat: add int4 reinforcement learning training support (Part1) (THUDM…

8549da2

…#1362) Co-authored-by: fy1214 <[email protected]> Co-authored-by: Gao016 <[email protected]> Co-authored-by: yefei12 <[email protected]> Co-authored-by: yzlnew <[email protected]>

[docker] Fix mtp r3 and add tilelang (THUDM#1380)

2ebe16c

[docker] Comment out 'quant weights to fp8 ue8m0' (THUDM#1381)

fbb7da6

geo3k VLM multi-turn megatron update (THUDM#1378)

d83a0dc

Merge branch 'THUDM:main' into optimize-cp

83b0a3b

Merge branch 'main' into optimize-cp

804a534

Beichen-Ma closed this Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP] Optimize context parallel #1383

[FSDP] Optimize context parallel #1383

Uh oh!

Beichen-Ma commented Jan 12, 2026

Uh oh!

yueming-yuan commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FSDP] Optimize context parallel #1383

[FSDP] Optimize context parallel #1383

Uh oh!

Conversation

Beichen-Ma commented Jan 12, 2026

Motivation

Changes

Impact

Concerns

Uh oh!

yueming-yuan commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants