Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error threshold is too low? #12

Open
psyhtest opened this issue Jul 14, 2018 · 17 comments
Open

Error threshold is too low? #12

psyhtest opened this issue Jul 14, 2018 · 17 comments

Comments

@psyhtest
Copy link
Contributor

While running directconv-armcl-opencl and conv-armcl-opencl experiments with ArmCL v18.05, I noticed some validation failures:

anton@diviniti:~/CK_REPOS/local/experiment$ grep -c 4836425781 nntest-conv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/*.0001.json | grep -v :0
nntest-conv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/ckp-a0b5c6f13d81b8db.0001.json:2
anton@diviniti:~/CK_REPOS/local/experiment$ grep -c 4836425781 nntest-directconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/*.0001.json | grep -v :0
nntest-directconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/ckp-6a52a60e334d3b88.0001.json:2

on the same tensor shape:

anton@diviniti:~/CK_REPOS/local/experiment$ grep dataset_file nntest-conv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/ckp-a0b5c6f13d81b8db.0001.json
    "dataset_file": "shape-256-13-13-3-384-1-1", 
anton@diviniti:~/CK_REPOS/local/experiment$ grep dataset_file nntest-directconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-debug-mate10pro-767mhz-debug-conv-0001/ckp-6a52a60e334d3b88.0001.json
    "dataset_file": "shape-256-13-13-3-384-1-1", 
@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 14, 2018

The cause of failure is the same (repeated many times over):

"fail_reason": "Numerical outputs differ:\n46) 24.4836425781 vs 24.4825687408\n215) ...

As 0.0010 < | 24.4836425781 - 24.4825687408 | < 0.0011, it seems simply the case of a too low a threshold for this shape.

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 14, 2018

Indeed, this fails:

$ ck benchmark program:conv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1 \
--env.CK_ABS_DIFF_THRESHOLD=0.0010

while this doesn't:

$ ck benchmark program:conv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1 \
--env.CK_ABS_DIFF_THRESHOLD=0.0011

@psyhtest
Copy link
Contributor Author

I smell something fishy, however:

       - check failed on "tmp-ck-output.json" (Numerical outputs differ:
46) 24.4836425781 vs 24.4825687408
215) 24.4836425781 vs 24.4825687408
384) 24.4836425781 vs 24.4825687408
553) 24.4836425781 vs 24.4825687408
722) 24.4836425781 vs 24.4825687408
891) 24.4836425781 vs 24.4825687408
1060) 24.4836425781 vs 24.4825687408
1229) 24.4836425781 vs 24.4825687408
1398) 24.4836425781 vs 24.4825687408
1567) 24.4836425781 vs 24.4825687408
1736) 24.4836425781 vs 24.4825687408
1905) 24.4836425781 vs 24.4825687408
2074) 24.4836425781 vs 24.4825687408
2243) 24.4836425781 vs 24.4825687408
2412) 24.4836425781 vs 24.4825687408
2581) 24.4836425781 vs 24.4825687408
2750) 24.4836425781 vs 24.4825687408
2919) 24.4836425781 vs 24.4825687408
3088) 24.4836425781 vs 24.4825687408
3257) 24.4836425781 vs 24.4825687408
3426) 24.4836425781 vs 24.4825687408
3595) 24.4836425781 vs 24.4825687408
3764) 24.4836425781 vs 24.4825687408
3933) 24.4836425781 vs 24.4825687408
4102) 24.4836425781 vs 24.4825687408
4271) 24.4836425781 vs 24.4825687408
4440) 24.4836425781 vs 24.4825687408
4609) 24.4836425781 vs 24.4825687408
4778) 24.4836425781 vs 24.4825687408
4947) 24.4836425781 vs 24.4825687408
5116) 24.4836425781 vs 24.4825687408
5285) 24.4836425781 vs 24.4825687408
5454) 24.4836425781 vs 24.4825687408

Do you see any pattern?

@psyhtest
Copy link
Contributor Author

The values mismatch at indices 46 + 169n. Suspiciously, the tensor is shape-256-13-13-3-384-1-1...

@psyhtest
Copy link
Contributor Author

On HiKey960, the default threshold is good enough with ArmCL v18.05:

$ ck benchmark program:conv-armcl-opencl --cmd_key=default \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1
...
Some statistics:

* Failed: no
...

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 15, 2018

On Mediatek X20, I could not get any results until rebuilt the library as follows:

$ ck install package:lib-armcl-opencl-18.05 --target_os=android24-arm64 \
--env.USE_GRAPH=ON --env.USE_NEON=ON --env.USE_EMBEDDED_KERNELS=ON \
--env.DEBUG=ON --extra_version=-debug

Here, the default threshold also didn't cause any problems:

$ ck benchmark program:conv-armcl-opencl --target_os=android24-arm64  --cmd_key=default \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1
...
Some statistics:

* Failed: no
...

I'm beginning to think that we should not change the threshold....

@psyhtest
Copy link
Contributor Author

A similar problem on another dataset file?

anton@diviniti:~/CK_REPOS/local/experiment$ grep -c 740882873 nntest-winogradconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-mate10pro-767mhz-conv3x3-inception-v3/*.0001.json | grep -v :0
nntest-winogradconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-mate10pro-767mhz-conv3x3-inception-v3/ckp-f0d94e1992c17fc9.0001.json:2
anton@diviniti:~/CK_REPOS/local/experiment$ grep dataset_file nntest-winogradconv-armcl-opencl-arm-compute-library-opencl-18.05-b3a371b-mate10pro-767mhz-conv3x3-inception-v3/ckp-f0d94e1992c17fc9.0001.json
    "dataset_file": "shape-448-8-8-3-384-1-1",

@psyhtest
Copy link
Contributor Author

The values mismatch at indices 59 + 64n. Again, 64=8*8=H*W.

       - check failed on "tmp-ck-output.json" (Numerical outputs differ:
59) -65.7422866821 vs -65.7408828735
123) -65.7422866821 vs -65.7408828735
187) -65.7422866821 vs -65.7408828735
251) -65.7422866821 vs -65.7408828735
315) -65.7422866821 vs -65.7408828735
379) -65.7422866821 vs -65.7408828735
443) -65.7422866821 vs -65.7408828735
507) -65.7422866821 vs -65.7408828735
571) -65.7422866821 vs -65.7408828735
635) -65.7422866821 vs -65.7408828735
699) -65.7422866821 vs -65.7408828735
763) -65.7422866821 vs -65.7408828735
827) -65.7422866821 vs -65.7408828735
891) -65.7422866821 vs -65.7408828735
955) -65.7422866821 vs -65.7408828735
1019) -65.7422866821 vs -65.7408828735
1083) -65.7422866821 vs -65.7408828735
1147) -65.7422866821 vs -65.7408828735
1211) -65.7422866821 vs -65.7408828735
...

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 17, 2018

As 0.0010 < | -65.7422866821 + 65.7408828735 | < 0.0015, updating the threshold to 0.0015 does stop the failure:

$ ck benchmark program:winogradconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv3x3-inception-v3 --dataset_file=shape-448-8-8-3-384-1-1 \
--env.CK_ABS_DIFF_THRESHOLD=0.0015 --repetitions=1
...
Some statistics:

* Failed: no
...

However, conv and directconv do not fail even with the default threshold:

$ ck benchmark program:conv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv3x3-inception-v3 --dataset_file=shape-448-8-8-3-384-1-1 \
--repetitions=1
...
Some statistics:

* Failed: no
...
$ ck benchmark program:directconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv3x3-inception-v3 --dataset_file=shape-448-8-8-3-384-1-1 \
--repetitions=1
...
Some statistics:

* Failed: no
...

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 17, 2018

Changing the seed stops all the above failures:

$ ck benchmark program:winogradconv-armcl-opencl \
--cmd_key=default --target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv3x3-inception-v3 --dataset_file=shape-448-8-8-3-384-1-1 \
--repetitions=1 --env.CK_SEED=1
...
Some statistics:

* Failed: no
...
$ ck benchmark program:conv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1 \
--repetitions=1 --env.CK_SEED=1
...
Some statistics:

* Failed: no
...
$ ck benchmark program:directconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-13-13-3-384-1-1 \
--repetitions=1 --env.CK_SEED=1
...
Some statistics:

* Failed: no
...

So perhaps the numerical error accumulates for certain tensor coordinates depending on the seed.

Maybe we should just increase it to the maximum value for which no failures happen for the default seed (42). Based on the failures observed so far, it's 0.0015.

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 20, 2018

Mismatches ("Numerical outputs differ") on tensors in tensor-conv-0001:

  • conv:
    • shape-256-63-63-3-512-1-1: -34.0275344849 vs -34.026222229 (abs diff 0.0013122559000038336 < 0.0014)
    • shape-256-13-13-3-384-1-1: 24.4836425781 vs 24.4825687408 (abs diff 0.0010738372999981038 < 0.0011)
    • shape-384-13-13-3-256-1-1: 32.0275421143 vs 32.0293502808 (abs diff 0.0018081665000053704 < 0.0019)
    • shape-384-13-13-3-384-1-1: 32.0275421143 vs 32.0293502808 (abs diff 0.0018081665000053704 < 0.0019)
  • directconv
    • shape-256-13-13-3-384-1-1: 24.4836425781 vs 24.4825687408 (abs diff 0.0010738372999981038 < 0.0011)
    • shape-384-13-13-3-256-1-1: 32.0275421143 vs 32.0293502808 (abs diff 0.0018081665000053704 < 0.0019)
    • shape-384-13-13-3-384-1-1: 32.0275421143 vs 32.0293502808 (abs diff 0.0018081665000053704 < 0.0019)

Based on these failures, the threshold should be raised to 0.0019. But as it depends on the tensor, why don't we include it with the tensor metadata?

Note that while it also seems to depend on the operator implementation (conv fails on shape-256-63-63-3-512-1-1, while directconv doesn't), the directconv data is currently incomplete (with 17 tensors out of 24).

@psyhtest
Copy link
Contributor Author

Yes, directconv also fails on shape-256-63-63-3-512-1-1:

$ ck benchmark program:directconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-63-63-3-512-1-1 \
--repetitions=1 --deps.compiler=f4947b23287580ee
...
       - check failed on "tmp-ck-output.json" (Numerical outputs differ:
720) -34.0275344849 vs -34.026222229
...

with the same acceptance threshold of 0.0014:

$ ck benchmark program:directconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-63-63-3-512-1-1 \
--repetitions=1 --deps.compiler=f4947b23287580ee \
--env.CK_ABS_DIFF_THRESHOLD=0.0014
...
Some statistics:

* Failed: no
...

@psyhtest
Copy link
Contributor Author

psyhtest commented Jul 20, 2018

Raising the threshold for all tensor shapes in the same way (e.g. from 0.001 to 0.002) does not sound like a good idea, if only a handful of shapes require this and only for certain device / driver / library combinations.

I tried to make the following change locally for shape-256-63-63-3-512-1-1:

diff --git a/dataset/tensor-conv-0001/shape-256-63-63-3-512-1-1.json b/dataset/tensor-conv-0001/shape-256-63-63-3-512-1-1.json
index 38fb5a6..b5e888a 100644
--- a/dataset/tensor-conv-0001/shape-256-63-63-3-512-1-1.json
+++ b/dataset/tensor-conv-0001/shape-256-63-63-3-512-1-1.json
@@ -5,5 +5,6 @@
   "CK_OUT_SHAPE_C": 512, 
   "CK_CONV_KERNEL": 3, 
   "CK_CONV_STRIDE": 1, 
-  "CK_CONV_PAD": 1
-}
\ No newline at end of file
+  "CK_CONV_PAD": 1, 
+  "CK_ABS_DIFF_THRESHOLD": 0.0015 
+}

This actually solved the issue so that:

$ ck benchmark program:directconv-armcl-opencl --cmd_key=default \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-256-63-63-3-512-1-1 \
--repetitions=1 --deps.compiler=f4947b23287580ee

would not fail!

However, I noticed that CK still showed the CK_ABS_DIFF_THRESHOLD variable set to 0.001, as per the metadata of e.g. program:conv-armcl-opencl:

  "run_vars": {
    "CK_ABS_DIFF_THRESHOLD": 0.001,
    "CK_IN_SHAPE_N": 1,
    "CK_OUT_RAW_DATA": "tmp-ck-output.bin",
    "CK_SEED": 42
  },

Moreover, when I tried to set --env.CK_ABS_DIFF_THRESHOLD=0.0001 (i.e. a smaller value than even the default which would be likely to cause an error), the test still passed.

I will try to reproduce this on another shape shortly, but for now I think the behaviour is:

  1. CK_ABS_DIFF_THRESHOLD set in a shape file overrides everything (the default environment of the operator and the environment set via the command line).
  2. CK misleadingly still prints the default value of CK_ABS_DIFF_THRESHOLD.

I would prefer the following behavior:

  1. CK_ABS_DIFF_THRESHOLD set in a shape file overrides the default environment of the operator.
  2. CK_ABS_DIFF_THRESHOLD set via the command line overrides everything (the default environment of the operator and the default environment of the shape).
  3. Print the actual value of CK_ABS_DIFF_THRESHOLD.

What do you think?

@Chunosov
Copy link
Contributor

A program only reads env var CK_ABS_DIFF_THRESHOLD once in postprocessing. We should ask @gfursin how ck initializes it. But seems it reads program meta, then overrides values with ones passed via command line and prints them, and then overrides them with values from a dataset.

@psyhtest
Copy link
Contributor Author

An interesting case of non-uniform periodic output difference requiring different thresholds:

$ ck benchmark program:winogradconv-armcl-opencl --cmd_key=default --repetitions=1 \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-96-27-27-5-256-1-2 \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO
...
       - check failed on "tmp-ck-output.json" (Numerical outputs differ:
219) 22.9101810455 vs 22.9088726044
354) -27.9759273529 vs -27.9771194458
746) 29.9037017822 vs 29.9026870728
1000) 22.9101810455 vs 22.9088726044
1135) -27.9759273529 vs -27.9771194458
1137) 29.9037017822 vs 29.9026870728
  • 22.9101810455 vs 22.9088726044 (abs diff 0.0013084411000008345 < 0.0014)
  • -27.9759273529 vs -27.9771194458 (abs diff 0.0011920928999984426 < 0.0012)
  • 29.9037017822 vs 29.9026870728 (abs diff 0.0010147093999997026 < 0.0011)
$ ck benchmark program:winogradconv-armcl-opencl --cmd_key=default --repetitions=1 \
--dataset_uoa=tensor-conv-0001 --dataset_file=shape-96-27-27-5-256-1-2 \
--target_os=android24-arm64 --env.CK_PUSH_LIBS_TO_REMOTE=NO \
--env.CK_ABS_DIFF_THRESHOLD=0.0014
...
Some statistics:

* Failed: no
...

@psyhtest
Copy link
Contributor Author

It seems that we need a higher threshold for high values of $W$ and $H$ (8, 13, 27, 63).

@Chunosov
Copy link
Contributor

higher threshold for high values of $W$ and $H$ (8, 13, 27, 63).

maybe some normalization is needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants