Skip to content

fix: race condition with interruptibles #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elasticspoon
Copy link

@elasticspoon elasticspoon commented May 19, 2025

Note: this blocking / nonblocking threaded / forked code is hella confusing to me so I might have gotten some stuff wrong here.

The Problem

When running TARGET_DB=sqlite bin/rails test --seed 38423 I consistently had failures like:

Failure:
LifecycleHooksTest#test_run_lifecycle_hooks [test/integration/lifecycle_hooks_test.rb:90]:
Expected: 15
  Actual: 13

I believe these failures are not sqlite exclusive but are more noticeable there cause it was not running in docker.

The test:

  • sets a bunch of lifecycle hooks SolidQueue.on_scheduler_start {...}
  • runs the supervisor pid = run_supervisor_as_fork(...)
  • terminates the supervisor terminate_process(pid)
  • then checks that the JobResults from the callbacks were created

Instead of the Scheduler receiving the TERM signal, gracefully shutting down and running callbacks it would not do anything in response to the signal, hit the timeout, receive a KILL signal and not run the callbacks.

Race Condition

It seems like the issue was self_pipe being slow:

def self_pipe
  @self_pipe ||= create_self_pipe
end

def create_self_pipe
  reader, writer = IO.pipe
  { reader: reader, writer: writer }
end

One thread would call run and while it was creating the pipe another thread would call stop.

This is a race condition. Both threads end up creating their own pipes and when interrupt writes to the pipe the does not go anywhere cause the other thread has a different pipe.

Thread A: #run
Thread A: #self_pipe
Thread B: #stop
Thread B: #self_pipe
Thread B: #self_pipe => {reader: #<IO:fd 65>, writer: #<IO:fd 66>}
Thread B: #interrupt => #<IO:fd 66>
Thread A: #self_pipe => {reader: #<IO:fd 63>, writer: #<IO:fd 64>}
Thread B: done => killing thread
Thread A: #interruptible_sleep => #<IO:fd 63>

If this race condition happens Thread A will sleep until it gets killed.

Solution

I ended up fixing this by getting rid of the lazy initialization of the pipe. That way any thread that is accessing the same scheduler will have the same pipe.

Alternatively maybe there should be a Mutex around the setter for self_pipe.

@elasticspoon elasticspoon marked this pull request as ready for review May 19, 2025 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant