[kernels] Update npu kernel repository and layer name #42275

zheliuyu · 2025-11-19T08:25:01Z

What does this PR do?

Still going strong and continuing to attempt. 😆

This is a proof-of-concept experiment for #39105 (comment)

Prepare the env

cann= 8.3.RC1
torch= 2.7.1
torch_npu= 2.7.1
device= Atlas 900 A2 * 8

pip install -e kernels

git clone https://github.com/zheliuyu/transformers-dev
pip install -e transformers-dev

pip install llamafactory

Using LLaMA-Factory, we fine-tuned Qwen3-8B.

llamafactory-cli train custom.yaml

custom.yaml

### model
model_name_or_path: Qwen/Qwen3-8B
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/Qwen/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

result

No kernels

Set use_kernels=False

***** train metrics *****
"epoch": 3.0,
"total_flos": 2.8124,
"train_loss": 1.159449,
"train_runtime": 303.7698,
"train_samples_per_second": 10.7650,
"train_steps_per_second": 0.1780

Use this pr

Set use_kernels=True

***** train metrics *****
"epoch": 3.0,
"total_flos": 2.8124,
"train_loss": 1.159411,
"train_runtime": 272.5237,
"train_samples_per_second": 11.9990,
"train_steps_per_second": 0.1980

(303.7698 - 272.5237) / 303.7698 ≈ 10.2%

The results show an approximate 10% speedup from w/o this pr.

SunMarc

Thanks, just a comment

SunMarc · 2025-11-19T13:00:14Z

src/transformers/integrations/hub_kernels.py

-                Mode.INFERENCE: LayerRepository(
-                    repo_id="kernels-community/liger_kernels",
-                    layer_name="LigerRMSNorm",
+                Mode.TRAINING: LayerRepository(
+                    repo_id="kernels-ext-npu/rmsnorm",
+                    layer_name="rmsnorm",


for inference should we still keep liger_kernels ?

Also @zheliuyu, I have a few concerns about including kernels from other communities that may not yet be fully mature in the default mapping of Transformers, since it's code being run on users devices, and we need to keep control of what's being executed. I would kindly suggest using the KernelConfig directly and specifying the desired mapping there instead of using the default one for now. For example:

kernel_config = KernelConfig(kernel_mapping={"RMSNorm": "kernels-ext-npu/rmsnorm:rmsnorm"}) model = AutoModelForCausalLM.from_pretrained( "unsloth/Llama-3.2-1B-Instruct", use_kernels=True, device_map=torch_device, kernel_config=kernel_config )

Once the npu community is mature enough we can consider adding kernels to the default mapping directly.

zheliuyu added 2 commits November 18, 2025 11:07

Update NPU kernel repository and layer name

a6dcc2c

Merge branch 'huggingface:main' into main

ee90e16

zheliuyu force-pushed the main branch from 38e38e8 to ee90e16 Compare November 19, 2025 08:53

Update training mode for npu rmsnorm layer

27220ce

zheliuyu marked this pull request as ready for review November 19, 2025 09:22

github-actions bot requested review from MekkCyber and SunMarc November 19, 2025 09:22

SunMarc reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kernels] Update npu kernel repository and layer name #42275

[kernels] Update npu kernel repository and layer name #42275

zheliuyu commented Nov 19, 2025 •

edited

Loading

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Nov 19, 2025

Uh oh!

MekkCyber Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[kernels] Update npu kernel repository and layer name #42275

Are you sure you want to change the base?

[kernels] Update npu kernel repository and layer name #42275

Conversation

zheliuyu commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Prepare the env

custom.yaml

result

No kernels

Use this pr

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

MekkCyber Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zheliuyu commented Nov 19, 2025 •

edited

Loading