fix: race condition with interruptibles #566
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: this blocking / nonblocking threaded / forked code is hella confusing to me so I might have gotten some stuff wrong here.
The Problem
When running
TARGET_DB=sqlite bin/rails test --seed 38423
I consistently had failures like:I believe these failures are not sqlite exclusive but are more noticeable there cause it was not running in docker.
The test:
SolidQueue.on_scheduler_start {...}
pid = run_supervisor_as_fork(...)
terminate_process(pid)
JobResults
from the callbacks were createdInstead of the Scheduler receiving the
TERM
signal, gracefully shutting down and running callbacks it would not do anything in response to the signal, hit the timeout, receive aKILL
signal and not run the callbacks.Race Condition
It seems like the issue was
self_pipe
being slow:One thread would call
run
and while it was creating the pipe another thread would callstop
.This is a race condition. Both threads end up creating their own pipes and when
interrupt
writes to the pipe the does not go anywhere cause the other thread has a different pipe.If this race condition happens Thread A will sleep until it gets killed.
Solution
I ended up fixing this by getting rid of the lazy initialization of the pipe. That way any thread that is accessing the same scheduler will have the same pipe.
Alternatively maybe there should be a Mutex around the setter for
self_pipe
.