-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests related to ClassificationNeuralNetwork very slow on Debian x86-64 CI runners #168
Comments
The classification classdefs contain several tests each and some of them take reasonable amount of time, because they contain a number of tests that apply training apart from the usual input validation and error checking BISTs.
The Testing the entire statistics package on my system takes some time:
|
Thanks for your feedback. I am being told that the Debian machine running these tests is very powerful, with 256 GB RAM and 64 cores. So I don’t see any other option than disabling all the problematic tests, since I don’t understand the issue at hand. Is there an easy way to disable all tests related to neural networks? (since only those create a problem) |
I don't know any other way than removing the tests, but I 'd rather not do that. Furthermore, is there a chance that there might be some sort of regression due to some newer library causing this? My system, although Debian based, is not the latest there is. Is there any way that you could increase the amount of waiting time in the CI before it time out? |
Unfortunately I cannot control the timeout in the CI, this is decided by another team in Debian. On my local machine, which is a fairly recent x86-64 desktop, the whole testsuite takes about 90s using an up-to-date Debian unstable (the same as in the CI). So it does not seem that third party libraries are the source of the problem. Also note that the CI runners use Netlib BLAS/LAPACK. But locally I get mostly the same duration with both Netlib BLAS/LAPACK and OpenBLAS, so I doubt that forcing OpenBLAS in the CI would solve the issue (though I may try). |
I ended up disabling the problematic tests because I still don’t understand the underlying issue. Here is the patch that disables the tests, if that may help: |
I also see a long time (but comparable to what @pr0m1th3as has) on Ryzen and it is quite fast on Apple:
Perhaps we should investigate it a little more. |
A quick profile shows that
On Centos Stream 9 / Ryzen 9 3950X:
|
The only think I can assume is that compiling on Mac makes much better use of the |
That was a good lead. I think the problem is that
That may explain @svillemot extra long time with 64 cores (128 threads?) |
This makes sense. But we do need the Is there any efficient way to switch parallel processing inside the c++ code depending on the amount of data? Would this improve the overall performance? Perhaps, @svillemot can somehow use |
I am not an expert, but see I do not know if it is possible to change max number of OMP threads from within octave. One needs to set Somewhat a side note: OMP by default is setting OMP_NUM_THREADS to the number of |
From what I 've found online, I could use
to limit the number of threads from inside the program at running time. However, I am not sure how to determine the most appropriate amount of threads to use based on the layers' size of the fcnn. I can tell from the code that the amount of data does not mmatter and it is only the complexity of each layer of the fcnn where the parallelization is performed. Is there a rule of thumb for this? Or do I have to start testing with trial and error until I figure out a spot at which efficiency is maximized? |
I seems to remember (but do not quote me on that) that OpenBLAS (or may be it was MKL) sets OMP threads to 1 for matrices with less than 1000 elements. I do not think there is a universal rule of thumb. |
May be you just make |
I think I will go for a combination of both. |
I think you should also experiment with |
…umber of threads for omp as well as 'alpha' parameter for ReLU and ELU activation layers, see issue #168
I made some changes to the compiled functions to accept number of threads as an input parameter but also to default to 1 thread when computing layers of less than 1000 neurons. The profiling results below on my machine (Intel® Core™ i7-10710U CPU @ 1.10GHz × 12 with Ubuntu 20.04LTS) show similar percentages with those on Mac.
Relevant tests seem to go faster with the latest changes
@svillemot Can you test the latest sources to see whether it still produces a time out issue on the CI? @dasergatskov I haven't changed the functionality in the classdef to accept an additional optional argument for |
I confirm that the CI timeout issue is now gone with your latest fixes. Thanks. |
The tests related to
ClassificationNeuralNetwork
are extremely slow compared to the others on Debian x86-64 CI runners.See for example those logs which have timings prepended to each log line:
https://ci.debian.net/packages/o/octave-statistics/unstable/amd64/56397827/
https://ci.debian.net/packages/o/octave-statistics/unstable/amd64/55938923/
This leads to timeouts on the CI runners.
Curiously, the problem does not manifest on other processor architectures (not even on x86-32).
Do you have any idea of what might be causing the problem?
The text was updated successfully, but these errors were encountered: