Skip to content

Race condition between parentPort.postMessage() and process.on("uncaughtException") - possibly out of spec? #59617

@juj

Description

@juj

Version

22.18.0 LTS

Platform

Linux h12dsi 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec  5 13:09:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Consider the following program:

test_worker.js

const { Worker } = require("worker_threads");

// 1. Create a Worker
const worker = new Worker(`
  process.on("uncaughtException", (err) => {
    console.log(\`uncaughtException on worker thread: \${err.message}\`);
    throw err;
  });

  // 2. post a message from the Worker
  require('worker_threads').parentPort.postMessage('IGetSometimesLost');
  // 3. and back to back, throw an exception
  throw new Error("Test exception from worker");
`, { eval: true });

worker.on("message", (msg) => {
  console.log(`Worker.onmessage caught in main thread: ${msg}`);
});

worker.on("error", (err) => {
  console.log(`Worker.onerror caught in main thread: ${err.message}`);
  throw err;
});

process.on("uncaughtException", (err) => {
  console.log(`uncaughtException on main thread: ${err.message}`);
  process.exit(1);
});

This program:

  1. creates a Worker
  2. posts a message 'IGetSometimesLost' from that Worker.
  3. Right after posting the message, throws an exception in the Worker, which is uncaught in the Worker.

The main thread observes for both messages and uncaught exceptions from the Worker.

Almost always, the postMessage arrives first, and the program prints:

Worker.onmessage caught in main thread: IGetSometimesLost
uncaughtException on worker thread: Test exception from worker
Worker.onerror caught in main thread: Test exception from worker
uncaughtException on main thread: Test exception from worker

which makes sense, since in the sequential program order, postMessage() occurred before the new Error() event.

But every once in a while, the program instead outputs

uncaughtException on worker thread: Test exception from worker
Worker.onerror caught in main thread: Test exception from worker
uncaughtException on main thread: Test exception from worker

i.e. the first line

Worker.onmessage caught in main thread: IGetSometimesLost

is not printed.

What is happening is that the order in which the postMessage() and uncaughtException events are received, are flipped from sequential program order in the Worker.

The question here is: is this race condition as per spec from Node.js, and should be expected behavior? (i.e. ordering is not guaranteed, and should not be relied on?)

Or does this observation indicate a bug in Node.js?

Thanks for considering.

How often does it reproduce? Is there a required condition?

The occurrence of the unexpected behavior is about 1 out of 50 runs on average.

I use this Python script to coax out the race condition:

import subprocess
import threading
import multiprocessing
import sys

COMMAND = ["node", "test_worker.js"]

stop_flag = threading.Event()

# Shared counter and lock
loop_counter = 0
counter_lock = threading.Lock()

def worker():
    global loop_counter
    while not stop_flag.is_set():
        with counter_lock:
            loop_counter += 1  # count each loop iteration

        try:
            result = subprocess.run(COMMAND, capture_output=True)
            stdout = result.stdout.decode(errors="ignore")

            expected = '''Worker.onmessage caught in main thread: IGetSometimesLost
uncaughtException on worker thread: Test exception from worker
Worker.onerror caught in main thread: Test exception from worker
uncaughtException on main thread: Test exception from worker'''
            if expected not in stdout and not stop_flag.is_set():
                stop_flag.set()
                print(stdout)
                break
        except Exception as e:
            print(f"Error running command: {e}")
            stop_flag.set()
            break

def main():
    n_threads = multiprocessing.cpu_count()
    threads = []

    for _ in range(n_threads):
        t = threading.Thread(target=worker, daemon=True)
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

    # Report how many times the inner loop ran in total
    print(f"Inner loop executed {loop_counter} times")

if __name__ == "__main__":
    main()

Reproducing the issue on a 16-thread workstation typically occurs in about 1-5 seconds.

What is the expected behavior? Why is that the expected behavior?

Expected print output:

Worker.onmessage caught in main thread: IGetSometimesLost
uncaughtException on worker thread: Test exception from worker
Worker.onerror caught in main thread: Test exception from worker
uncaughtException on main thread: Test exception from worker

What do you see instead?

Unexpected print output:

uncaughtException on worker thread: Test exception from worker
Worker.onerror caught in main thread: Test exception from worker
uncaughtException on main thread: Test exception from worker

Additional information

This issue is causing emscripten-core/emscripten#15014 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    workerIssues and PRs related to Worker support.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions