-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task_group: seems like wait() may return prematurely #1638
Comments
Oh, another interesting observation that may help: Turned out I forgot to copy tbb12.dll to engine's folder after rebuilding oneTBB. I only copied the engine after I rebuilt it with the headers from 2021.13.0. Headers from 2021.13.0 + dll from 2022.1.0 also work. So looks like the issue is somewhere in the headers... |
Hi @solbjorn, thanks for reporting the issue. #include <oneapi/tbb/task_group.h>
#include <chrono>
#include <thread>
#include <iostream>
int main() {
int* render_data = nullptr;
int* service_data = nullptr;
auto render_task = [&] {
std::this_thread::sleep_for(std::chrono::seconds(20));
render_data = new int(100);
};
auto service_task = [&] {
std::this_thread::sleep_for(std::chrono::seconds(20));
service_data = new int(200);
};
tbb::task_group tg;
tg.run(render_task);
tg.run(service_task);
tg.wait();
std::cout << "Render data = " << *render_data << std::endl;
std::cout << "Service data = " << *service_data << std::endl;
delete render_data;
delete service_data;
} I have used |
@solbjorn what combination of headers + dll shows the failure? I assume that 2022.x dll is used at runtime. Did you do a partial recompile of your application/engine and so could some components be compiled against against 2022.x TBB headers but others compiled against TBB 2021.13.0 or earlier headers? |
@vossmjp, the issue happens with the headers from any oneTBB version newer than 2021.13, dll doesn't matter. @kboyarinov, CMake flags are default, just without building tests. I manually add -march=alderlake, but without it the issue is still present. Since you currently can't reproduce it, let me collect more details within a couple hours. |
Ok, I modified the example above and compiled it separately to make sure the issue can be reproduced outside the game engine. Output when everything works:
"Rendering" is a lambda which sleeps for 2 seconds and then does After I revisited the compilation flags, it turned out that if the app which uses oneTBB 2022+ is compiled with
If the app uses oneTBB 2021.13 or older version, then it's safe to compile it even with I realize that it's developer's responsibility to pick optimal and safe compilation flags for his program, so closing the issue would be valid. OTOH, the engine uses around 15 external libs and has about 1.5 mln lines of own code and amongst all this only tg.wait() from 2022+ fails with this flag, the whole rest (including oneTBB) works just as expected. So maybe something in the task_group code is wrong there somewhere... |
I was finally able to reproduce the issue while setting |
Yes, I confirm this fix works, thanks a lot! Interestring though, why sometimes such annotations are needed in order to work properly. |
Summary
task_group::wait() may return earlier than the tasks queued with ::run() are finished.
Version
2022.0.0 and up to the latest master - affected
2021.13.0 - works
Might be related to commit 1f52f50?
Environment
Intel Alderlake CPU (16 perf threads, 8 energy eff threads)
Windows 11 (24H2 currently, although happens on olders as well)
ClangCL 19.1.1 from Visual Studio 2022 17.14 (earlier versions suffer, too)
C++20 application (game engine), all tasking and threading is done only via oneTBB
Observed Behavior
Consider the following simple example:
This works as expected on any version. When parallel_invoke() returns, all the required data is processed and available.
Now we change this to:
According to the documentation, they must behave roughly the same way.
But time to time, on versions newer than 2021.13.0 I get crashes in the code after tg.wait(), with the dumps showing me that the data render_task() and service_task() must've processed is not ready yet.
However, W/A like this:
works just fine. This made me think tg.wait() exits prematurely.
Where possible, I use parallel_invoke(), parallel_for() etc., but sometimes I just need to fire an async task and then wait for the results in a different part of the code, so I anyway have to use task_groups (at least until task_arena implements per-task waiters)
Expected Behavior
The code below tg.wait() must be run only after the task group finished processing all of the tasks.
Steps To Reproduce
The examples above should be enough, considering that the tasks that are run via tg.run() will take some time.
The text was updated successfully, but these errors were encountered: