Skip to content

dapo_ray_trainer 加入generatece_sequence_loop后出现的问题 #69

@Aukarous

Description

@Aukarous

带#为原代码,#后面的为新加的,主要想通过loop实现multi_turns:

with marked_timer("step", timing_raw):
    # generate a batch
    with marked_timer("gen", timing_raw, "red"):
        # gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
        # timing_raw.update(gen_batch_output.meta_info["timing"])
        # gen_batch_output.meta_info.pop("timing", None)
        if self.config.actor_rollout_ref.rollout.max_turns is None:
            gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
        else:
            gen_batch_output = self.actor_rollout_wg.generate_sequences_loop(gen_batch)

新加的长度过滤和reward归一化:

if not self.config.algorithm.filter_groups.enable: 
    batch = new_batch
else:
    filter_id = [] 
    for idx, (uid, input_ids) in enumerate(zip(new_batch.non_tensor_batch['uid'], new_batch.batch['input_ids'])): 
        if len(input_ids[input_ids != 151643]) < self.config.data.max_prompt_length: 
            filter_id.append(idx) 
new_batch = new_batch[filter_id]

metric_name = self.config.algorithm.filter_groups.metric 
if metric_name == "seq_final_reward": 
    # Turn to numpy for easier filtering 
    new_batch.non_tensor_batch["seq_final_reward"] = new_batch.batch["token_level_rewards"].sum(dim=-1).numpy() 
elif metric_name == "seq_reward": 
    # ===============================add====================================== 
    seq_reward = new_batch.batch["token_level_scores"].sum(dim=-1).numpy() 
    seq_reward = (seq_reward - seq_reward.mean()) / (seq_reward.std() + 1e-8) 
    new_batch.non_tensor_batch["seq_reward"] = seq_reward 
    # ===============================add======================================

报错位置:
1.batch = new_batch if batch is None else DataProto.concat([batch, new_batch])
2.# Align the batch
traj_bsz = self.config.data.train_batch_size * self.config.actor_rollout_ref.rollout.n
batch = batch[:traj_bsz]
3.RL-Factory/verl/workers/fsdp_workers.py", line 773, in generate_sequences_loop:
work.wait() RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:81] Timed out waiting 1800000ms for recv operation to complete
4.

Image

环境:A800*8,qwen3-8B,torch==2.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions