Simplified installation requirements to support more accelerators #303

ji-huazhong · 2025-02-13T04:46:12Z

We aim to explore the feasibility of reproducing DeepSeek R1 on an Ascend NPU, capitalizing on the existing implementation found in this repository. The Hugging Face ecosystem, which encompasses tools such as Transformers, accelerate, peft, trl, and safetensors, already provides robust support for a variety of accelerators beyond NVIDIA GPUs, including Ascend NPU and Intel XPU, among others

This PR introduces minor changes to installation prerequisites to allow open-r1 to run out of the box on third-party accelerators.

In addition, to speed up the generation of training samples, the GRPOTrainer in trl was modified to support the use of vllm on non-CUDA devices. A PR(huggingface/trl#2836) is also submitted for this modification.

Closes: #44

cc @qgallouedec @lewtun

ji-huazhong · 2025-02-13T05:52:43Z

To use the vllm on the Ascend NPU, you need to install the vllm-ascend plug-in https://github.com/vllm-project/vllm-ascend.

Refer to the GRPO usage example in readme and run the following command:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml \
--num_processes=7 src/open_r1/grpo.py \
--config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo.yaml

The following is the part of printout during the training process:

INFO|trainer.py:2369] 2025-02-12 22:37:20,161 >> ***** Running training *****
[INFO|trainer.py:2370] 2025-02-12 22:37:20,161 >>   Num examples = 72,441
[INFO|trainer.py:2371] 2025-02-12 22:37:20,161 >>   Num Epochs = 1
[INFO|trainer.py:2372] 2025-02-12 22:37:20,161 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:2375] 2025-02-12 22:37:20,161 >>   Total train batch size (w. parallel, distributed & accumulation) = 28
[INFO|trainer.py:2376] 2025-02-12 22:37:20,162 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2377] 2025-02-12 22:37:20,162 >>   Total optimization steps = 18,111
[INFO|trainer.py:2378] 2025-02-12 22:37:20,163 >>   Number of trainable parameters = 1,543,714,304
  0%|          | 0/18111 [00:00<?, ?it/s][rank0]:[W212 22:37:46.619869600 compiler_depend.ts:133] Warning: Warning: Device do not support double dtype now, dtype cast repalce with float. (function operator())
{'loss': 0.0, 'grad_norm': 63.02857971191406, 'learning_rate': 1.1037527593818985e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.2857142984867096, 'reward': 0.2857142984867096, 'reward_std': 0.45624351501464844, 'completion_length': 204.39288330078125, 'kl': 0.0, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 36.02487564086914, 'learning_rate': 2.207505518763797e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.4642857611179352, 'reward': 0.4642857313156128, 'reward_std': 0.4837399125099182, 'completion_length': 217.2857208251953, 'kl': 0.0, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 2.645524501800537, 'learning_rate': 3.311258278145696e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.1428571492433548, 'reward_std': 0.24397504329681396, 'completion_length': 459.7857360839844, 'kl': 0.0012054443359375, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 73.87657165527344, 'learning_rate': 4.415011037527594e-08, 'rewards/accuracy_reward': 0.1428571492433548, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.2857142984867096, 'reward_std': 0.4446004033088684, 'completion_length': 367.1785888671875, 'kl': 0.004150390625, 'epoch': 0.0}
{'loss': 0.0001, 'grad_norm': 44.3963623046875, 'learning_rate': 5.518763796909493e-08, 'rewards/accuracy_reward': 0.1785714328289032, 'rewards/format_reward': 0.2857142984867096, 'reward': 0.4642857313156128, 'reward_std': 0.522879421710968, 'completion_length': 248.78573608398438, 'kl': 0.00262451171875, 'epoch': 0.0}
{'loss': 0.0001, 'grad_norm': 44.259666442871094, 'learning_rate': 6.622516556291392e-08, 'rewards/accuracy_reward': 0.0357142873108387, 'rewards/format_reward': 0.25, 'reward': 0.2857142984867096, 'reward_std': 0.4720968008041382, 'completion_length': 292.25, 'kl': 0.0029144287109375, 'epoch': 0.0}
{'loss': 0.0004, 'grad_norm': 77.98892211914062, 'learning_rate': 7.72626931567329e-08, 'rewards/accuracy_reward': 0.0357142873108387, 'rewards/format_reward': 0.25, 'reward': 0.2857142984867096, 'reward_std': 0.4248207211494446, 'completion_length': 393.14288330078125, 'kl': 0.01019287109375, 'epoch': 0.0}
{'loss': 0.0003, 'grad_norm': 35.490386962890625, 'learning_rate': 8.830022075055188e-08, 'rewards/accuracy_reward': 0.0714285746216774, 'rewards/format_reward': 0.0357142873108387, 'reward': 0.1071428656578064, 'reward_std': 0.2164786458015442, 'completion_length': 447.5, 'kl': 0.007568359375, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 46.677669525146484, 'learning_rate': 9.933774834437088e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.1428571492433548, 'reward_std': 0.3109697699546814, 'completion_length': 353.89288330078125, 'kl': 0.00567626953125, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 34.44541931152344, 'learning_rate': 1.1037527593818986e-07, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1071428656578064, 'reward': 0.1071428656578064, 'reward_std': 0.2164786458015442, 'completion_length': 420.0, 'kl': 0.004425048828125, 'epoch': 0.0}

baymax591 · 2025-02-14T10:53:30Z

This PR helps a lot, I hope it can speed up the integration

Simplified installation requirements to support more accelerators

bb18989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified installation requirements to support more accelerators #303

Simplified installation requirements to support more accelerators #303

ji-huazhong commented Feb 13, 2025 •

edited

Loading

ji-huazhong commented Feb 13, 2025 •

edited

Loading

baymax591 commented Feb 14, 2025

Simplified installation requirements to support more accelerators #303

Are you sure you want to change the base?

Simplified installation requirements to support more accelerators #303

Conversation

ji-huazhong commented Feb 13, 2025 • edited Loading

ji-huazhong commented Feb 13, 2025 • edited Loading

baymax591 commented Feb 14, 2025

ji-huazhong commented Feb 13, 2025 •

edited

Loading

ji-huazhong commented Feb 13, 2025 •

edited

Loading