You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Detailed description:
I used the C++ version of OpenCV for model inference with a simple convolutional network using the GPU. In release mode, when the batch size is 1, the inference time is 40 ms, but when the batch size is 4, the time is approximately 160 ms. The expectation is that the inference time for the model is 40 ms, whether the batch size is 1 or 4. Why is there no parallel inference? In debug mode, the following error is output:
[ INFO:[email protected]] global registry_parallel.impl.hpp:96 cv::parallel::ParallelBackendRegistry::ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_onetbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_onetbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_tbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_tbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_openmp490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_openmp490_64d.dll => FAILED
[ INFO:[email protected]] global op_cuda.cpp:80 cv::dnn::dnn4_v20231225::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type NetInputLayer
the layer "_input" of type NetInputLayer be accelerated with GPU, instead using CPU.
Why can't the model perform parallel inference?
How to solve this problem? pls!
The text was updated successfully, but these errors were encountered:
Detailed description:
I used the C++ version of OpenCV for model inference with a simple convolutional network using the GPU. In release mode, when the batch size is 1, the inference time is 40 ms, but when the batch size is 4, the time is approximately 160 ms. The expectation is that the inference time for the model is 40 ms, whether the batch size is 1 or 4. Why is there no parallel inference? In debug mode, the following error is output:
[ INFO:[email protected]] global registry_parallel.impl.hpp:96 cv::parallel::ParallelBackendRegistry::ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_onetbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_onetbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_tbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_tbb490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load D:\code\ISImgDetect\demo\opencv_core_parallel_openmp490_64d.dll => FAILED
[ INFO:[email protected]] global plugin_loader.impl.hpp:67 cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_openmp490_64d.dll => FAILED
[ INFO:[email protected]] global op_cuda.cpp:80 cv::dnn::dnn4_v20231225::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type NetInputLayer
the layer "_input" of type NetInputLayer be accelerated with GPU, instead using CPU.
Why can't the model perform parallel inference?
How to solve this problem? pls!
The text was updated successfully, but these errors were encountered: