Skip to content

Commit 2200540

Browse files
NeosZhanglljbash
andauthored
Zq/add specified autocompare2 (#794)
* craft * draft * fix * fix * update readme * fix md lint * fix cpp lint * fix readme * fix lint * fix * rm autcompare CI * add copyright * fix clang-format * fix clang-tidy * fix lint * Update dipu/QuickStart.md Co-authored-by: Lingjie <[email protected]> * remove ENV USE_GLOBAL_AUTOCOMPARE * fix * fix * fix readme * fix * add directMemCopyH2H * ceclear * reformat func register * fix clang-format * fix * fix * fix * fix * test * fix * fix test_fallback * fix py-lint * fix * fix const_var name * fix register macro name, use CUSTOMFALLBACK, instead of FALLBACK" * fix comment * fix * fix autocompare for _amp_foreach_non_finite_check_and_unscale_ * fix const var name with Google style --------- Co-authored-by: Lingjie <[email protected]>
1 parent 5e7a35b commit 2200540

File tree

14 files changed

+376
-200
lines changed

14 files changed

+376
-200
lines changed

.github/workflows/main.yml

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -365,21 +365,6 @@ jobs:
365365
source scripts/ci/ascend/ci_ascend_env.sh
366366
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
367367
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )
368-
369-
Build-Ascend-910b-with-autocompare:
370-
name: Build-dipu-ascend-910b-with-autocompare
371-
needs: [Build-PyTorch-For-Ascend-910b]
372-
runs-on: tps-ascend-ci-910b
373-
steps:
374-
- name: Build dipu
375-
run: |
376-
set -ex
377-
export USE_COVERAGE=ON
378-
export USE_AUTOCOMPARE=ON
379-
cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && cp -R source ${GITHUB_JOB} && cd ${GITHUB_JOB}/dipu
380-
source scripts/ci/ascend/ci_ascend_env.sh
381-
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
382-
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )
383368
384369
Test-Ascend-910b:
385370
name: Test-dipu-ascend-910b

dipu/QuickStart.md

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,10 @@ sh ./tests/python/run_tests.sh
158158

159159
### 算子库拓展功能
160160

161-
#### 算子 Fallback
161+
#### 算子Fallback功能
162162

163-
Fallback 给定算子:
163+
Fallback指的是使用算子的CPU实现,而非设备实现。
164+
Fallback给定算子:
164165

165166
```bash
166167
export DIPU_FORCE_FALLBACK_OPS_LIST=add.out,conv2d
@@ -181,20 +182,13 @@ export DIPU_FORCE_FALLBACK_OPS_LIST='.*'
181182
python -c "import torch_dipu"
182183
```
183184

184-
#### 算子精度自动对比功能介绍
185+
#### 算子精度自动对比功能
185186

186-
由于该功能默认不开启,使用该功能时需要打开该功能并重新编译DIPU。
187-
188-
可以通过设置环境变量USE_AUTOCOMPARE=ON,来开启该功能,然后需要重新编译DIPU。
189-
190-
```shell
191-
export USE_AUTOCOMPARE=ON
192-
```
193-
194-
以上方法是对所有算子开启自动精度对比。如果只需要对特定算子做精度对比,也可只给需要的算子做精度对比,只需要在相关的配置文件(如 `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml`)给相应的算子添加 `autocompare: True` 即可。
187+
算子精度自动对比功能(autocompare)用于确保算子计算结果的正确性,通过将设备参数拷贝到CPU上,对比CPU和设备的计算结果来判断精度是否达标。以下是算子精度自动对比功能的使用例子:
195188

196189
```shell
197-
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 cpu, 可选
190+
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 CPU, 可选
191+
$ export DIPU_AUTOCOMPARE_OPS_LIST=add.out # 对add.out算子开启autocompare功能
198192
$ python
199193
>>> import torch
200194
>>> import torch_dipu
@@ -220,11 +214,33 @@ autocompare: add.out other: allclose
220214
>>>
221215
```
222216

223-
可以看到,CPU 计算结果与设备计算结果 `allclose`,也能看到 CPU 和设备计算结果的 `shape``dtype` 等信息。特别的,需要注意以下几个问题:
217+
可以看到,输出包括 CPU 和设备计算结果的 `shape``stride``dtype` 等信息, 最终结果是CPU和设备的self和out都是allclose的。
218+
219+
##### 算子精度自动对比功能的设置
220+
221+
算子精度自动对比功能默认不开启,可以设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST`来控制该功能,在开启算子自动对比功能前,必须unset `DIPU_FORCE_FALLBACK_OPS_LIST`
222+
223+
- 可以通过设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST='.*'`,开启全局的精度对比,这种情况下所有调用的算子都会进行精度对比。
224+
225+
```shell
226+
# 开启全局的算子精度自动对比功能
227+
export DIPU_AUTOCOMPARE_OPS_LIST='.*'
228+
```
229+
230+
- 可以设置`DIPU_AUTOCOMPARE_OPS_LIST`来指定算子开启自动精度对比,支持正则表达式匹配,也可以指定多个算子开启自动精度对比。算子名可以参考[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml)
231+
232+
```shell
233+
# 指定匹配add.*?的算子进行自动精度对比
234+
export DIPU_AUTOCOMPARE_OPS_LIST=add.*?
235+
# 指定add.out、sub.out算子进行自动精度对比
236+
export DIPU_AUTOCOMPARE_OPS_LIST="add.out, sub.out"
237+
```
238+
239+
NOTE:
224240

225-
1. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss``conv2d``dropout``dropout_``linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
226-
2. 随机数生成相关的算子(`dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autocompare:False`)没有做 `autocompare`,因为结果总是 `not_allclose`
227-
3. 对输入做检查是确保算子输入不被意外修改
241+
1. 部分算子并不支持自动精度对比功能,可以查看[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml),其中的`autocompare`配置项为`disable`即不支持自动精度对比功能,同时也可以修改`diopi_functions.yaml`,将某些算子的`autocompare`配置项设置为`disable`来禁用自动对比功能
242+
2. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss``conv2d``dropout``dropout_``linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
243+
3. 对输入参数(self)做检查是确保算子的输入不被意外修改
228244

229245
#### 抓取算子参数
230246

dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py

Lines changed: 98 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@
88
from diopi_wrapper_template import (
99
diopi_wrapper_file_template_content,
1010
diopi_wrapper_function_template_content,
11-
op_register_template_content,
11+
op_no_customfallback_with_autocompare_register_template_content,
12+
op_no_customfallback_no_autocompare_register_template_content,
1213
custom_autograd_template_content,
1314
autocompare_template_content,
14-
op_with_custom_fallback_register_template_content,
15+
op_with_customfallback_with_autocompare_register_template_content,
16+
op_with_customfallback_no_autocompare_register_template_content,
1517
)
1618

1719

@@ -671,10 +673,20 @@ def create_optional_generator_process_code(arg_name):
671673

672674
fun_template = CodeTemplate(diopi_wrapper_function_template_content)
673675

674-
op_register_template = CodeTemplate(op_register_template_content)
676+
op_no_customfallback_with_autocompare_register_template = CodeTemplate(
677+
op_no_customfallback_with_autocompare_register_template_content
678+
)
679+
680+
op_no_customfallback_no_autocompare_register_template = CodeTemplate(
681+
op_no_customfallback_no_autocompare_register_template_content
682+
)
683+
684+
op_with_customfallback_with_autocompare_register_template = CodeTemplate(
685+
op_with_customfallback_with_autocompare_register_template_content
686+
)
675687

676-
op_with_custom_fallback_register_template = CodeTemplate(
677-
op_with_custom_fallback_register_template_content
688+
op_with_customfallback_no_autocompare_register_template = CodeTemplate(
689+
op_with_customfallback_no_autocompare_register_template_content
678690
)
679691

680692
custom_autograd_template = CodeTemplate(custom_autograd_template_content)
@@ -906,7 +918,7 @@ def functions_code_gen(fun_config):
906918
fbody += custom_autograd_function_code
907919
fun_name = wrapper_fun_name
908920

909-
if fun_config.get("autocompare", False) in [True, "True"] and fun_config.get(
921+
if fun_config.get("autocompare") not in ["disable"] and fun_config.get(
910922
"register_op", True
911923
) in [True, "True"]:
912924
auto_compare_fun_name = fun_name + "_autocompare"
@@ -940,40 +952,88 @@ def functions_code_gen(fun_config):
940952
],
941953
)
942954
fbody += autocompare_code
943-
fun_name = auto_compare_fun_name
944-
945-
if fun_config.get("custom_fallback", False) in ["False", False]:
946-
register_body = op_register_template.substitute(
947-
register_name=[get_op_name_from_schema(fun_config["schema"])],
948-
aten_fun_name=["dipu::native::" + fun_name],
949-
diopi_fun_name=[
950-
get_fun_name_from_cppsignature(diopi_interface).replace(
951-
"diopi", "::diopi"
952-
)
953-
],
955+
956+
# generate the OP_register code
957+
# case 1: custom_fallback=False and autocompare not disabled
958+
register_body = ""
959+
if fun_config.get("custom_fallback", False) in ["False", False] and fun_config.get(
960+
"autocompare", True
961+
) in ["True", True]:
962+
register_body = (
963+
op_no_customfallback_with_autocompare_register_template.substitute(
964+
register_name=[get_op_name_from_schema(fun_config["schema"])],
965+
aten_fun_name=["dipu::native::" + fun_name],
966+
diopi_fun_name=[
967+
get_fun_name_from_cppsignature(diopi_interface).replace(
968+
"diopi", "::diopi"
969+
)
970+
],
971+
)
954972
)
955-
else:
956-
register_body = op_with_custom_fallback_register_template.substitute(
957-
register_name=[get_op_name_from_schema(fun_config["schema"])],
958-
aten_fun_name=["dipu::native::" + fun_name],
959-
diopi_fun_name=[
960-
get_fun_name_from_cppsignature(diopi_interface).replace(
961-
"diopi", "::diopi"
962-
)
963-
],
964-
force_fallback=[
965-
(
966-
"false"
967-
if fun_config.get("force_fallback", False) in [False, "False"]
968-
else "true"
969-
)
970-
],
971-
fallbackFunc=[
972-
"dipu::native::"
973-
+ "custom_fallback_"
974-
+ fun_name.replace("_autocompare", "")
975-
],
973+
974+
# case2: custom_fallback=False and autocompare=disabled
975+
elif fun_config.get("custom_fallback", False) in [
976+
"False",
977+
False,
978+
] and fun_config.get("autocompare") in ["disable"]:
979+
register_body = (
980+
op_no_customfallback_no_autocompare_register_template.substitute(
981+
register_name=[get_op_name_from_schema(fun_config["schema"])],
982+
aten_fun_name=["dipu::native::" + fun_name],
983+
diopi_fun_name=[
984+
get_fun_name_from_cppsignature(diopi_interface).replace(
985+
"diopi", "::diopi"
986+
)
987+
],
988+
)
976989
)
990+
# case3: custom_fallback=True and autocompare not disabled
991+
elif fun_config.get("custom_fallback", False) in ["True", True] and fun_config.get(
992+
"autocompare", True
993+
) in ["True", True]:
994+
register_body = (
995+
op_with_customfallback_with_autocompare_register_template.substitute(
996+
register_name=[get_op_name_from_schema(fun_config["schema"])],
997+
aten_fun_name=["dipu::native::" + fun_name],
998+
diopi_fun_name=[
999+
get_fun_name_from_cppsignature(diopi_interface).replace(
1000+
"diopi", "::diopi"
1001+
)
1002+
],
1003+
force_fallback=[
1004+
(
1005+
"false"
1006+
if fun_config.get("force_fallback", False) in [False, "False"]
1007+
else "true"
1008+
)
1009+
],
1010+
fallbackFunc=["dipu::native::" + "custom_fallback_" + fun_name],
1011+
)
1012+
)
1013+
# case4: custom_fallback=True and autocompare disabled
1014+
elif fun_config.get("custom_fallback", False) in ["True", True] and fun_config.get(
1015+
"autocompare", True
1016+
) in ["disable"]:
1017+
register_body = (
1018+
op_with_customfallback_no_autocompare_register_template.substitute(
1019+
register_name=[get_op_name_from_schema(fun_config["schema"])],
1020+
aten_fun_name=["dipu::native::" + fun_name],
1021+
diopi_fun_name=[
1022+
get_fun_name_from_cppsignature(diopi_interface).replace(
1023+
"diopi", "::diopi"
1024+
)
1025+
],
1026+
force_fallback=[
1027+
(
1028+
"false"
1029+
if fun_config.get("force_fallback", False) in [False, "False"]
1030+
else "true"
1031+
)
1032+
],
1033+
fallbackFunc=["dipu::native::" + "custom_fallback_" + fun_name],
1034+
)
1035+
)
1036+
9771037
return fbody, register_body
9781038

9791039

@@ -1039,12 +1099,6 @@ def parse_args():
10391099
type=boolean_string,
10401100
help="whether generate code that prints op args",
10411101
)
1042-
parser.add_argument(
1043-
"--autocompare",
1044-
default=False,
1045-
type=boolean_string,
1046-
help="whether generate code that compare device calculation results with cpu calculation results",
1047-
)
10481102
parser.add_argument(
10491103
"--fun_config_dict",
10501104
type=json.loads,

dipu/scripts/autogen_diopi_wrapper/autogen_wrapped_code.sh

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,16 @@
55
DIPU_DIR=$(readlink -f $(dirname $(readlink -f "$0"))/../..)
66
AUTOGEN_DIOPI_WRAPPER=$DIPU_DIR/scripts/autogen_diopi_wrapper
77

8-
USE_AUTOCOMPARE=${1:-OFF}
9-
UsedVendor=${2:-cuda}
10-
Torch_VERSION=${3:-2.1.0}
11-
GENERATED_KERNELS_SCRIPT=${4:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
12-
GENERATED_KERNELS_CONFIG=${5:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
13-
GENERATED_KERNELS=${6:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
8+
UsedVendor=${1:-cuda}
9+
Torch_VERSION=${2:-2.1.0}
10+
GENERATED_KERNELS_SCRIPT=${3:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
11+
GENERATED_KERNELS_CONFIG=${4:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
12+
GENERATED_KERNELS=${5:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
1413

1514
GENERATED_KERNELS_VENDOR=${DIPU_DIR}/third_party/DIOPI/impl/${UsedVendor}/convert_config.yaml
1615

1716
PYTHON_CMD="python3 ${GENERATED_KERNELS_SCRIPT} --out=${GENERATED_KERNELS} --config=${GENERATED_KERNELS_CONFIG} \
18-
--autocompare=${USE_AUTOCOMPARE} --print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
17+
--print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
1918
--fun_config_dict='{\"current_device\":\"${UsedVendor}\",\"current_torch_ver\":\"${Torch_VERSION}\"}'"
2019

2120
if [ -f "$GENERATED_KERNELS_VENDOR" ]; then

dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2748,7 +2748,6 @@
27482748

27492749
# this copy_ aten op may use both diopiCastDtype and diopiCopyInp. it's a proxy/composite op
27502750
- schema: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
2751-
autocompare: disable
27522751
dummy_call_diopi: True
27532752
custom_fallback: True
27542753
device: [cuda, camb, ascend, droplet, supa, kunlunxin]
@@ -2760,16 +2759,14 @@
27602759

27612760
# vendor who has no fully implemented diopi and proper fallback DIPUCopy sub-class
27622761
- schema: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
2763-
autocompare: disable
27642762
custom_fallback: True
27652763
dummy_call_diopi: True
27662764
custom_code_at_the_beginning: |
27672765
return custom_fallback_dipu_copy_(self, src, non_blocking);
27682766
device: [topsrider]
27692767
interface: diopiCopyInp(ctx, src, self)
27702768

2771-
- schema: _amp_foreach_non_finite_check_and_unscale_(at::TensorList self, Tensor(b!) found_inf, Tensor inv_scale) -> void
2772-
autocompare: disable
2769+
- schema: _amp_foreach_non_finite_check_and_unscale_(at::TensorList self, Tensor(b!) found_inf, Tensor inv_scale) -> ()
27732770
custom_fallback: True
27742771
custom_code_at_the_beginning: |
27752772
std::vector<diopiTensorHandle_t> diopiTensorHandles(self.size(), nullptr);
@@ -2780,8 +2777,6 @@
27802777
});
27812778
// NOLINTEND(cppcoreguidelines-pro-type-const-cast)
27822779
interface: diopiAmpForeachNonFiniteCheckAndUnscaleInp(ctx, diopiTensorHandles.data(), static_cast<int64_t>(self.size()), found_inf, inv_scale)
2783-
# TODO(someone): fix this issue when `autocompare` is on
2784-
autocompare: disable
27852780

27862781
- schema: _amp_update_scale_(Tensor(a!) self, Tensor(b!) growth_tracker, Tensor found_inf, float scale_growth_factor, float scale_backoff_factor, int growth_interval) -> Tensor(a!)
27872782
custom_fallback: True

dipu/scripts/autogen_diopi_wrapper/diopi_wrapper_template.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
#include "csrc_dipu/aten/ops/DIPUCopy.hpp"
5151
#include "csrc_dipu/aten/ops/NodispatchUtils.hpp"
5252
#include "csrc_dipu/aten/ops/OpUtils.hpp"
53+
#include "csrc_dipu/aten/ops/OpRegexMatch.hpp"
5354
#include "csrc_dipu/base/basedef.h"
5455
#include "csrc_dipu/diopirt/diopirt_impl.h"
5556
#include "csrc_dipu/profiler/profiler.h"
@@ -127,12 +128,20 @@
127128
}
128129
"""
129130

130-
op_register_template_content = """
131-
DIOPI_ATEN_FUNC("$register_name", $diopi_fun_name, $aten_fun_name);
131+
op_no_customfallback_with_autocompare_register_template_content = """
132+
NO_CUSTOMFALLBACK_WITH_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $aten_fun_name);
132133
"""
133134

134-
op_with_custom_fallback_register_template_content = """
135-
DIOPI_ATEN_FUNC_CUSTOM_FALLBACK("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
135+
op_no_customfallback_no_autocompare_register_template_content = """
136+
NO_CUSTOMFALLBACK_NO_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $aten_fun_name);
137+
"""
138+
139+
op_with_customfallback_with_autocompare_register_template_content = """
140+
WITH_CUSTOMFALLBACK_WITH_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
141+
"""
142+
143+
op_with_customfallback_no_autocompare_register_template_content = """
144+
WITH_CUSTOMFALLBACK_NO_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
136145
"""
137146

138147
custom_autograd_template_content = """

dipu/scripts/ci/ascend/ci_ascend_script.sh

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,13 @@ function build_diopi_lib() {
1212
function config_dipu_ascend_cmake() {
1313
mkdir -p build && cd ./build
1414
cmake_args="-DCMAKE_BUILD_TYPE=Release -DDEVICE=ascend -DWITH_DIOPI_LIBRARY=DISABLE"
15-
if [ -n "$USE_AUTOCOMPARE" ]; then
16-
cmake_args+=" -DUSE_AUTOCOMPARE=${USE_AUTOCOMPARE}"
17-
fi
1815
cmake ../ $cmake_args
1916
cd ../
2017
}
2118

2219
function config_all_ascend_cmake() {
2320
mkdir -p build && cd ./build
2421
cmake_args="-DCMAKE_BUILD_TYPE=Release -DDEVICE=ascend -DENABLE_COVERAGE=${USE_COVERAGE} -DWITH_DIOPI=INTERNAL"
25-
if [ -n "$USE_AUTOCOMPARE" ]; then
26-
cmake_args+=" -DUSE_AUTOCOMPARE=${USE_AUTOCOMPARE}"
27-
fi
2822
cmake ../ $cmake_args
2923
cd ../
3024
}

0 commit comments

Comments
 (0)