Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kill all children on SIGTERM #106

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

bonzini
Copy link
Contributor

@bonzini bonzini commented May 3, 2024

Handle SIGTERM by forwarding it to all children and waiting for them to stop. This is a better behavior than letting the children continue in the background.

The price to pay is that if a program does not respond to SIGTERM, samurai will have to be killed with SIGKILL. This however is consistent with many other programs that invoke and manage child processes, and the reason will be apparent from e.g. ps or top output, so overall I think this is in improvement.

@bonzini
Copy link
Contributor Author

bonzini commented May 21, 2024

any news?

Copy link
Owner

@michaelforney michaelforney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm whether or not ninja has a similar behavior when it receives SIGTERM?

Extract it from jobwork() so that build() can call it on a signal.

Signed-off-by: Paolo Bonzini <[email protected]>
Keep the system clean by propagating SIGTERM to all children,
and by not starting new jobs on both SIGTERM and SIGINT.

The only tricky bit is that previously fd[i].revents was used
to skip both jobs that are not in use and jobs that did not
have output; that's because negative file descriptors
do not cause POLLNVAL and therefore fd[i].revents is zero for
inactive jobs as well.  But because all jobs must be killed,
build() now has to check fd[i].fd == -1 explicitly.

While at it, also clean up jobdone() by clearing job[i].edge;
it's not nice to leave a dangling pointer in the jobs array,
even if it's harmless.

Signed-off-by: Paolo Bonzini <[email protected]>
@michaelforney
Copy link
Owner

Thinking about this some more, I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

I also did some digging into ninja to see how it deals with these signals and found a few things:

This leaves me with a few questions. Since ninja doesn't forward the signal to any foreground job, doesn't it have the same issue you're fixing here? If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

Currently, samurai doesn't make new process groups for jobs. In this PR, we're making the assumption that SIGINT is usually sent to the whole process group due to a Ctrl-C. However, if SIGINT is sent to only samurai, then I believe it will just stop starting new jobs and wait until any active jobs finish naturally. Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

@bonzini
Copy link
Contributor Author

bonzini commented Dec 17, 2024

I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

That is caught by:

                if (!have_work() && !numjobs)
                        break;

but indeed there is a microscopic window between this line and the poll() right below. I can fix it once we agree on what to do.

If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

ninja forwards the signal to the non-console processes (using process groups):

    if (!(*i)->use_console_)
      kill(-(*i)->pid_, interrupted_);

But I think for SIGTERM it should send it to console processes as well. Unlike SIGINT and SIGHUP, which are sent by the OS, SIGTERM is usually sent by the user with kill, and you cannot assume that the user sent it to the process group. In fact, I'd argue that because the idea of SIGTERM is to let the process clean up after itself, 1) it should not be sent to a process group, 2) it is a bug to not trap it if you spawn processes.

The behavior of moving the children in their process group was implemented for ninja-build/ninja#110 with no particular explanation; then it was changed to move the process into a session (ninja-build/ninja#909) and reverted (ninja-build/ninja#1097, but see also ninja-build/ninja#1001). Frankly I wouldn't take it as a good example even if samurai is a "ninja clone".

Make instead does the same as my implementation: it does not place processes in separate process groups, and forwards SIGTERM.

Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

That's a problem of the user that sent it to the process group; it's not samurai's problem. Generally I don't think that it would be an issue, because SIGTERM will either exit on the first or trigger orderly cleanup in the child. In the latter case it would be triggered twice but, because signal handlers in general don't do much work themselves, it should be safe to consider SIGTERM idempotent; and if they're not, that should be considered a bug in the program.

@bonzini
Copy link
Contributor Author

bonzini commented Jan 31, 2025

Any news?

@michaelforney
Copy link
Owner

Thanks for the ping. I'm happy with this, but would like to fix the signal race before merging. Sorry for the delay in reviewing. I appreciate your patience.

I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

That is caught by:

                if (!have_work() && !numjobs)
                        break;

but indeed there is a microscopic window between this line and the poll() right below. I can fix it once we agree on what to do.

I think the window is bigger than you suggest, but due to the poll timeout, it hangs for at most 5 seconds before sending the signal to jobs and exiting itself.

For example, say we are reading some input for the last job in the array when the signal arrives. We finish the jobwork loop and continue in the outer loop. have_work() returns false, so we don't start any more jobs, but there are still jobs running so we start the poll. If the jobs don't produce more output or exit, then the poll only returns when it times out, and only then do we kill the running jobs.

I was able to reproduce this pretty reliably with a job that wrote to stdout and then sent SIGTERM to samu:

[1/1] exec ./testjob
strace: Process 6572 attached
[pid  6571] 01:34:45 read(7, "", 4)     = 0
[pid  6571] 01:34:45 poll([{fd=5, events=POLLIN}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 8, 5000 <unfinished ...>
[pid  6572] 01:34:45 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid  6571] 01:34:45 <... poll resumed>) = 1 ([{fd=5, revents=POLLIN}])
[pid  6572] 01:34:45 kill(6571, SIGTERM <unfinished ...>
[pid  6571] 01:34:45 read(5,  <unfinished ...>
[pid  6572] 01:34:45 <... kill resumed>) = 0
[pid  6571] 01:34:45 <... read resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid  6571] 01:34:45 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=6572, si_uid=1000} ---
[pid  6571] 01:34:45 poll([{fd=5, events=POLLIN}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 8, 5000) = 0 (Timeout)
[pid  6571] 01:34:50 kill(6572, SIGTERM) = 0
[pid  6572] 01:34:50 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=6571, si_uid=1000} ---
[pid  6572] 01:34:50 +++ killed by SIGTERM +++
01:34:50 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=6572, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
samu: interrupted by signal 15
01:34:50 +++ exited with 1 +++

I think the cleanest solution is to write a byte to a pipe in the handler, and add the read end to the pollfd array.

If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

ninja forwards the signal to the non-console processes (using process groups):

    if (!(*i)->use_console_)
      kill(-(*i)->pid_, interrupted_);

But I think for SIGTERM it should send it to console processes as well. Unlike SIGINT and SIGHUP, which are sent by the OS, SIGTERM is usually sent by the user with kill, and you cannot assume that the user sent it to the process group. In fact, I'd argue that because the idea of SIGTERM is to let the process clean up after itself, 1) it should not be sent to a process group, 2) it is a bug to not trap it if you spawn processes.

The behavior of moving the children in their process group was implemented for ninja-build/ninja#110 with no particular explanation; then it was changed to move the process into a session (ninja-build/ninja#909) and reverted (ninja-build/ninja#1097, but see also ninja-build/ninja#1001). Frankly I wouldn't take it as a good example even if samurai is a "ninja clone".

Make instead does the same as my implementation: it does not place processes in separate process groups, and forwards SIGTERM.

Thanks for doing this investigation. I agree with your analysis.

Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

That's a problem of the user that sent it to the process group; it's not samurai's problem. Generally I don't think that it would be an issue, because SIGTERM will either exit on the first or trigger orderly cleanup in the child. In the latter case it would be triggered twice but, because signal handlers in general don't do much work themselves, it should be safe to consider SIGTERM idempotent; and if they're not, that should be considered a bug in the program.

Makes sense to me.

@michaelforney
Copy link
Owner

Regarding job[i].edge, my preference is just to leave it be. I'm not sure it makes sense to partially reinitialize the job struct (why edge and not cmd, pid, and fd)? In fact, I think j->buf.len = 0; probably should be moved to jobstart() where the rest of the struct is initialized.

@bonzini
Copy link
Contributor Author

bonzini commented Feb 2, 2025

I think the window is bigger than you suggest, but due to the poll timeout, it hangs for at most 5 seconds before sending the signal to jobs and exiting itself.

Yeah, you're right. I will fix it (in fact most of the logic is already in the jobserver patches...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants