Bug when use multuple gpus on ray #7084

oasis-0927 · 2025-02-26T10:41:58Z

Reminder

I have read the above rules and searched the existing issues.

System Info

I'm trying to finetune an LLM using ray and since the sft dataset is large, I tried to allocate 4 gpus into one worker.

ray

ray_run_name: ray_run_name
ray_num_workers: 1 # number of GPUs to use
resources_per_worker:
GPU: 4
placement_strategy: PACK

Reproduction

 command:
USE_RAY=1 CUDA_VISIBLE_DEVICES=4,5,6,7 PYTHONPATH=./ llamafactory-cli train conf/safety/sft.yaml 

This command will raise an error as below:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Others

No response

The text was updated successfully, but these errors were encountered:

oasis-0927 · 2025-02-26T10:43:20Z

And everything works fine after I removed USE_RAY=1 and chose deepspeed zero3 .

oasis-0927 added bug Something isn't working pending This problem is yet to be addressed labels Feb 26, 2025

oasis-0927 closed this as completed Feb 26, 2025

oasis-0927 reopened this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when use multuple gpus on ray #7084

Bug when use multuple gpus on ray #7084

oasis-0927 commented Feb 26, 2025

oasis-0927 commented Feb 26, 2025

Bug when use multuple gpus on ray #7084

Bug when use multuple gpus on ray #7084

Comments

oasis-0927 commented Feb 26, 2025

Reminder

System Info

ray

Reproduction

Others

oasis-0927 commented Feb 26, 2025