Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task_group: seems like wait() may return prematurely #1638

Open
solbjorn opened this issue Feb 17, 2025 · 7 comments
Open

task_group: seems like wait() may return prematurely #1638

solbjorn opened this issue Feb 17, 2025 · 7 comments
Assignees
Labels

Comments

@solbjorn
Copy link

Summary

task_group::wait() may return earlier than the tasks queued with ::run() are finished.

Version

2022.0.0 and up to the latest master - affected
2021.13.0 - works

Might be related to commit 1f52f50?

Environment

Intel Alderlake CPU (16 perf threads, 8 energy eff threads)
Windows 11 (24H2 currently, although happens on olders as well)
ClangCL 19.1.1 from Visual Studio 2022 17.14 (earlier versions suffer, too)
C++20 application (game engine), all tasking and threading is done only via oneTBB

Observed Behavior

Consider the following simple example:

// serial code

parallel_invoke(
    []{ render_task(); },
    []{ service_task(); },
);

// serial code that depends on what render_task() and service_task() do

This works as expected on any version. When parallel_invoke() returns, all the required data is processed and available.

Now we change this to:

// serial code

task_group tg;
tg.run([]{ render_task(); });
tg.run([]{ service_task(); });
tg.wait();

// serial code that depends on what render_task() and service_task() do

According to the documentation, they must behave roughly the same way.
But time to time, on versions newer than 2021.13.0 I get crashes in the code after tg.wait(), with the dumps showing me that the data render_task() and service_task() must've processed is not ready yet.

However, W/A like this:

// serial code

std::barrier wa(3);
task_group tg;

tg.run([&wa]{ render_task(); wa.arrive(); });
tg.run([&wa]{ service_task(); wa.arrive(); });
tg.wait();
wa.arrive_and_wait();

// serial code that depends on what render_task() and service_task() do

works just fine. This made me think tg.wait() exits prematurely.

Where possible, I use parallel_invoke(), parallel_for() etc., but sometimes I just need to fire an async task and then wait for the results in a different part of the code, so I anyway have to use task_groups (at least until task_arena implements per-task waiters)

Expected Behavior

The code below tg.wait() must be run only after the task group finished processing all of the tasks.

Steps To Reproduce

The examples above should be enough, considering that the tasks that are run via tg.run() will take some time.

@solbjorn solbjorn added the bug label Feb 17, 2025
@kboyarinov kboyarinov self-assigned this Feb 18, 2025
@solbjorn
Copy link
Author

solbjorn commented Feb 18, 2025

Oh, another interesting observation that may help:

Turned out I forgot to copy tbb12.dll to engine's folder after rebuilding oneTBB. I only copied the engine after I rebuilt it with the headers from 2021.13.0. Headers from 2021.13.0 + dll from 2022.1.0 also work. So looks like the issue is somewhere in the headers...

@kboyarinov
Copy link
Contributor

Hi @solbjorn, thanks for reporting the issue.
I have tried to reproduce it, but unfortunately, I was not able to do that. Here is the code I used based on the scenarios you provided:

#include <oneapi/tbb/task_group.h>
#include <chrono>
#include <thread>
#include <iostream>

int main() {
    int* render_data = nullptr;
    int* service_data = nullptr;

    auto render_task = [&] {
        std::this_thread::sleep_for(std::chrono::seconds(20));
        render_data = new int(100);
    };
    auto service_task = [&] {
        std::this_thread::sleep_for(std::chrono::seconds(20));
        service_data = new int(200);
    };

    tbb::task_group tg;

    tg.run(render_task);
    tg.run(service_task);
    tg.wait();

    std::cout << "Render data = " << *render_data << std::endl;
    std::cout << "Service data = " << *service_data << std::endl;

    delete render_data;
    delete service_data;
}

I have used clang-cl 18.1.8 from Visual Studio 2022 17.12.0. It does not perfectly match the compiler you have reported, but you mentioned that earlier versions are affected as well.
Could you please share with us more details about your setup to help us reproduce this on our side?
As far as I understand, you build TBB from source. Can you please provide the exact CMake settings you use.
Thank you for your help.

@vossmjp
Copy link
Contributor

vossmjp commented Feb 19, 2025

@solbjorn what combination of headers + dll shows the failure? I assume that 2022.x dll is used at runtime. Did you do a partial recompile of your application/engine and so could some components be compiled against against 2022.x TBB headers but others compiled against TBB 2021.13.0 or earlier headers?

@solbjorn
Copy link
Author

@vossmjp, the issue happens with the headers from any oneTBB version newer than 2021.13, dll doesn't matter.

@kboyarinov, CMake flags are default, just without building tests. I manually add -march=alderlake, but without it the issue is still present.

Since you currently can't reproduce it, let me collect more details within a couple hours.

@solbjorn
Copy link
Author

Ok, I modified the example above and compiled it separately to make sure the issue can be reproduced outside the game engine.

Output when everything works:

Rendering took 2.008901 seconds.
Service took 5.009844 seconds.
tg.wait() took 5.010622 seconds.
Render data = 100
Service data = 200

"Rendering" is a lambda which sleeps for 2 seconds and then does new int(100), "service" sleeps for 5 and does the same for 200.

After I revisited the compilation flags, it turned out that if the app which uses oneTBB 2022+ is compiled with -fwhole-program-vtables (requires IPO/LTO), then tg.wait() may fail:

tg.wait() took 0.000208 seconds.
Rendering took 2.946946 seconds.
<end of output>

If the app uses oneTBB 2021.13 or older version, then it's safe to compile it even with -fwhole-program-vtables.
Flags used during the oneTBB compilation don't matter here, only the target program.

I realize that it's developer's responsibility to pick optimal and safe compilation flags for his program, so closing the issue would be valid. OTOH, the engine uses around 15 external libs and has about 1.5 mln lines of own code and amongst all this only tg.wait() from 2022+ fails with this flag, the whole rest (including oneTBB) works just as expected. So maybe something in the task_group code is wrong there somewhere...

@kboyarinov
Copy link
Contributor

I was finally able to reproduce the issue while setting /O2 -fwhole-program-vtables -flto -fuse-ld=lld.
It seems like the issue is caused by applying a devirtualization optimization to the class wait_context_vertex that is used for reference counting in the task_group
As it is stated here, I have tried to explicitly mark this class with ``[[clang::lto_visibility_public]]` and it fixes my local test.
I have created a draft PR #1644 with these changes.
Could you please try this version on your side?

@solbjorn
Copy link
Author

Yes, I confirm this fix works, thanks a lot!

Interestring though, why sometimes such annotations are needed in order to work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants