layer_norm_grad for npu #10560

fpzh2011 · 2024-11-07T08:58:26Z

因为 NPU GPT2 测试场景下，layer_norm 都是 affine 的，为减少 CANN 调用，对 layer_norm_grad 进行重构。在一个 kernel 内完成 gamma，beta，dx 的梯度计算。

Fix SBP settings for LayerNormGradOp to ensure correct gradient aggregation for gamma_diff and beta_diff Changes - Updated SBP strategy in LayerNormGradOp: Set gamma_diff and beta_diff to use PartialSum instead of Split to avoid dimension mismatches during distributed training. - Added consistency check for begin_norm_axis and begin_params_axis: Enforce equality to ensure proper alignment of normalization and parameter dimensions.

fpzh2011 and others added 2 commits November 7, 2024 08:56

layer_norm_grad npu

df98f12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer_norm_grad for npu #10560

layer_norm_grad for npu #10560

fpzh2011 commented Nov 7, 2024

layer_norm_grad for npu #10560

Are you sure you want to change the base?

layer_norm_grad for npu #10560

Conversation

fpzh2011 commented Nov 7, 2024