Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified installation requirements to support more accelerators #303

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ji-huazhong
Copy link

@ji-huazhong ji-huazhong commented Feb 13, 2025

We aim to explore the feasibility of reproducing DeepSeek R1 on an Ascend NPU, capitalizing on the existing implementation found in this repository. The Hugging Face ecosystem, which encompasses tools such as Transformers, accelerate, peft, trl, and safetensors, already provides robust support for a variety of accelerators beyond NVIDIA GPUs, including Ascend NPU and Intel XPU, among others

This PR introduces minor changes to installation prerequisites to allow open-r1 to run out of the box on third-party accelerators.

In addition, to speed up the generation of training samples, the GRPOTrainer in trl was modified to support the use of vllm on non-CUDA devices. A PR(huggingface/trl#2836) is also submitted for this modification.

Closes: #44

cc @qgallouedec @lewtun

@ji-huazhong
Copy link
Author

ji-huazhong commented Feb 13, 2025

To use the vllm on the Ascend NPU, you need to install the vllm-ascend plug-in https://github.com/vllm-project/vllm-ascend.

Refer to the GRPO usage example in readme and run the following command:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml \
--num_processes=7 src/open_r1/grpo.py \
--config recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo.yaml

The following is the part of printout during the training process:

INFO|trainer.py:2369] 2025-02-12 22:37:20,161 >> ***** Running training *****
[INFO|trainer.py:2370] 2025-02-12 22:37:20,161 >>   Num examples = 72,441
[INFO|trainer.py:2371] 2025-02-12 22:37:20,161 >>   Num Epochs = 1
[INFO|trainer.py:2372] 2025-02-12 22:37:20,161 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:2375] 2025-02-12 22:37:20,161 >>   Total train batch size (w. parallel, distributed & accumulation) = 28
[INFO|trainer.py:2376] 2025-02-12 22:37:20,162 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2377] 2025-02-12 22:37:20,162 >>   Total optimization steps = 18,111
[INFO|trainer.py:2378] 2025-02-12 22:37:20,163 >>   Number of trainable parameters = 1,543,714,304
  0%|          | 0/18111 [00:00<?, ?it/s][rank0]:[W212 22:37:46.619869600 compiler_depend.ts:133] Warning: Warning: Device do not support double dtype now, dtype cast repalce with float. (function operator())
{'loss': 0.0, 'grad_norm': 63.02857971191406, 'learning_rate': 1.1037527593818985e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.2857142984867096, 'reward': 0.2857142984867096, 'reward_std': 0.45624351501464844, 'completion_length': 204.39288330078125, 'kl': 0.0, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 36.02487564086914, 'learning_rate': 2.207505518763797e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.4642857611179352, 'reward': 0.4642857313156128, 'reward_std': 0.4837399125099182, 'completion_length': 217.2857208251953, 'kl': 0.0, 'epoch': 0.0}
{'loss': 0.0, 'grad_norm': 2.645524501800537, 'learning_rate': 3.311258278145696e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.1428571492433548, 'reward_std': 0.24397504329681396, 'completion_length': 459.7857360839844, 'kl': 0.0012054443359375, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 73.87657165527344, 'learning_rate': 4.415011037527594e-08, 'rewards/accuracy_reward': 0.1428571492433548, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.2857142984867096, 'reward_std': 0.4446004033088684, 'completion_length': 367.1785888671875, 'kl': 0.004150390625, 'epoch': 0.0}
{'loss': 0.0001, 'grad_norm': 44.3963623046875, 'learning_rate': 5.518763796909493e-08, 'rewards/accuracy_reward': 0.1785714328289032, 'rewards/format_reward': 0.2857142984867096, 'reward': 0.4642857313156128, 'reward_std': 0.522879421710968, 'completion_length': 248.78573608398438, 'kl': 0.00262451171875, 'epoch': 0.0}
{'loss': 0.0001, 'grad_norm': 44.259666442871094, 'learning_rate': 6.622516556291392e-08, 'rewards/accuracy_reward': 0.0357142873108387, 'rewards/format_reward': 0.25, 'reward': 0.2857142984867096, 'reward_std': 0.4720968008041382, 'completion_length': 292.25, 'kl': 0.0029144287109375, 'epoch': 0.0}
{'loss': 0.0004, 'grad_norm': 77.98892211914062, 'learning_rate': 7.72626931567329e-08, 'rewards/accuracy_reward': 0.0357142873108387, 'rewards/format_reward': 0.25, 'reward': 0.2857142984867096, 'reward_std': 0.4248207211494446, 'completion_length': 393.14288330078125, 'kl': 0.01019287109375, 'epoch': 0.0}
{'loss': 0.0003, 'grad_norm': 35.490386962890625, 'learning_rate': 8.830022075055188e-08, 'rewards/accuracy_reward': 0.0714285746216774, 'rewards/format_reward': 0.0357142873108387, 'reward': 0.1071428656578064, 'reward_std': 0.2164786458015442, 'completion_length': 447.5, 'kl': 0.007568359375, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 46.677669525146484, 'learning_rate': 9.933774834437088e-08, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1428571492433548, 'reward': 0.1428571492433548, 'reward_std': 0.3109697699546814, 'completion_length': 353.89288330078125, 'kl': 0.00567626953125, 'epoch': 0.0}
{'loss': 0.0002, 'grad_norm': 34.44541931152344, 'learning_rate': 1.1037527593818986e-07, 'rewards/accuracy_reward': 0.0, 'rewards/format_reward': 0.1071428656578064, 'reward': 0.1071428656578064, 'reward_std': 0.2164786458015442, 'completion_length': 420.0, 'kl': 0.004425048828125, 'epoch': 0.0}

@baymax591
Copy link

This PR helps a lot, I hope it can speed up the integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

能否支持NPU?
2 participants