[training]Ascend Skill Contest 参赛提交训练题目2#8
Open
Michael-Salon wants to merge 1 commit into
Open
Conversation
- Support CPU, memory, NPU, HCCL data collection - Support level0/level1/level2 profiling levels - Support step range collection (start_step, end_step) - Include VeRL profiler configuration guide - Include common issues troubleshooting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
题目2: 训练框架 Profiling 采集
Skill名称
verl-npu-profiler
训练框架
VeRL
Prompt
我需要使用VeRL框架进行训练性能分析,请帮我配置NPU profiling采集,需要采集CPU和内存数据,使用level1级别,采集第10到20步的数据。
交互过程
Agent 响应:
根据您的需求,我为您配置 VeRL 框架的 NPU Profiling 采集方案:
python train.py \ --profiler.enable True \ --profiler.tool npu \ --profiler.tool_config.npu.level level1 \ --profiler.tool_config.npu.contents "[npu, cpu, memory]" \ --profiler.tool_config.npu.start_step 10 \ --profiler.tool_config.npu.end_step 20 \ --profiler.save_path ./profiler_data测试结果(截图)
功能覆盖:
contents: ["npu", "cpu"]contents: ["npu", "memory"]start_step/end_stepranks: [0, 1, ...]Skill 结构:
skills/verl-npu-profiler/
├── SKILL.md (442 行)
├── reference/
│ ├── framework_adaptation_guide.md (320 行)
│ └── npu_profiler_config.py (133 行)
└── scripts/
├── framework_adaptation_examples.py (409 行)
└── universal_npu_profiler.py (306 行)