Skip to content

[training]Ascend Skill Contest 参赛提交训练题目2#8

Open
Michael-Salon wants to merge 1 commit into
Eco-Sphere:mainfrom
Michael-Salon:main
Open

[training]Ascend Skill Contest 参赛提交训练题目2#8
Michael-Salon wants to merge 1 commit into
Eco-Sphere:mainfrom
Michael-Salon:main

Conversation

@Michael-Salon

@Michael-Salon Michael-Salon commented Mar 17, 2026

Copy link
Copy Markdown

题目2: 训练框架 Profiling 采集

Skill名称

verl-npu-profiler

训练框架

VeRL

Prompt

我需要使用VeRL框架进行训练性能分析,请帮我配置NPU profiling采集,需要采集CPU和内存数据,使用level1级别,采集第10到20步的数据。

交互过程

Agent 响应

根据您的需求,我为您配置 VeRL 框架的 NPU Profiling 采集方案:

  1. 配置 profiling 参数
config = {
    "enable": True,
    "tool": "npu",
    "ranks": [0],
    "save_path": "./profiler_data",
    "tool_config": {
        "npu": {
            "level": "level1",
            "contents": ["npu", "cpu", "memory"],
            "start_step": 10,
            "end_step": 20,
        }
    }
}
  1. 启动训练脚本
python train.py \
    --profiler.enable True \
    --profiler.tool npu \
    --profiler.tool_config.npu.level level1 \
    --profiler.tool_config.npu.contents "[npu, cpu, memory]" \
    --profiler.tool_config.npu.start_step 10 \
    --profiler.tool_config.npu.end_step 20 \
    --profiler.save_path ./profiler_data
  1. 查看结果
# 使用 Ascend Insight 查看
ascend-insight

# 或使用 TensorBoard
tensorboard --logdir=./profiler_data

测试结果(截图)

功能覆盖

功能 支持情况
CPU采集 contents: ["npu", "cpu"]
内存采集 contents: ["npu", "memory"]
不同level ✅ level0/level1/level2
step范围采集 start_step / end_step
多卡采集 ranks: [0, 1, ...]

Skill 结构
skills/verl-npu-profiler/
├── SKILL.md (442 行)
├── reference/
│ ├── framework_adaptation_guide.md (320 行)
│ └── npu_profiler_config.py (133 行)
└── scripts/
├── framework_adaptation_examples.py (409 行)
└── universal_npu_profiler.py (306 行)

- Support CPU, memory, NPU, HCCL data collection
- Support level0/level1/level2 profiling levels
- Support step range collection (start_step, end_step)
- Include VeRL profiler configuration guide
- Include common issues troubleshooting
@Michael-Salon Michael-Salon changed the title Ascend Skill Contest 参赛提交 Ascend Skill Contest 参赛提交训练题目2 Mar 19, 2026
@realliujiaxu realliujiaxu changed the title Ascend Skill Contest 参赛提交训练题目2 [training]Ascend Skill Contest 参赛提交训练题目2 Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant