kill all children on SIGTERM #106

bonzini · 2024-05-03T10:22:27Z

Handle SIGTERM by forwarding it to all children and waiting for them to stop. This is a better behavior than letting the children continue in the background.

The price to pay is that if a program does not respond to SIGTERM, samurai will have to be killed with SIGKILL. This however is consistent with many other programs that invoke and manage child processes, and the reason will be apparent from e.g. ps or top output, so overall I think this is in improvement.

bonzini · 2024-05-21T09:58:57Z

any news?

michaelforney

Could you confirm whether or not ninja has a similar behavior when it receives SIGTERM?

build.c

Extract it from jobwork() so that build() can call it on a signal. Signed-off-by: Paolo Bonzini <[email protected]>

Keep the system clean by propagating SIGTERM to all children, and by not starting new jobs on both SIGTERM and SIGINT. The only tricky bit is that previously fd[i].revents was used to skip both jobs that are not in use and jobs that did not have output; that's because negative file descriptors do not cause POLLNVAL and therefore fd[i].revents is zero for inactive jobs as well. But because all jobs must be killed, build() now has to check fd[i].fd == -1 explicitly. While at it, also clean up jobdone() by clearing job[i].edge; it's not nice to leave a dangling pointer in the jobs array, even if it's harmless. Signed-off-by: Paolo Bonzini <[email protected]>

michaelforney · 2024-12-17T07:46:07Z

Thinking about this some more, I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

I also did some digging into ninja to see how it deals with these signals and found a few things:

It seems that ninja starts all jobs except those with console = 1 with their own process group. When killing subprocesses, it skips the job running in the foreground under the assumption that whoever sent the signal sent it to the process group, not just the ninja process: https://github.com/ninja-build/ninja/blob/9d31ebba91515097e64184ca5bd460f6a7d6dfe5/src/subprocess-posix.cc#L360-L361
When ninja kills subprocesses (not under the foreground), since it starts them with their own process group, it forwards the signal to the whole group.

This leaves me with a few questions. Since ninja doesn't forward the signal to any foreground job, doesn't it have the same issue you're fixing here? If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

Currently, samurai doesn't make new process groups for jobs. In this PR, we're making the assumption that SIGINT is usually sent to the whole process group due to a Ctrl-C. However, if SIGINT is sent to only samurai, then I believe it will just stop starting new jobs and wait until any active jobs finish naturally. Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

bonzini · 2024-12-17T10:38:59Z

I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

That is caught by:

                if (!have_work() && !numjobs)
                        break;

but indeed there is a microscopic window between this line and the poll() right below. I can fix it once we agree on what to do.

If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

ninja forwards the signal to the non-console processes (using process groups):

    if (!(*i)->use_console_)
      kill(-(*i)->pid_, interrupted_);

But I think for SIGTERM it should send it to console processes as well. Unlike SIGINT and SIGHUP, which are sent by the OS, SIGTERM is usually sent by the user with kill, and you cannot assume that the user sent it to the process group. In fact, I'd argue that because the idea of SIGTERM is to let the process clean up after itself, 1) it should not be sent to a process group, 2) it is a bug to not trap it if you spawn processes.

The behavior of moving the children in their process group was implemented for ninja-build/ninja#110 with no particular explanation; then it was changed to move the process into a session (ninja-build/ninja#909) and reverted (ninja-build/ninja#1097, but see also ninja-build/ninja#1001). Frankly I wouldn't take it as a good example even if samurai is a "ninja clone".

Make instead does the same as my implementation: it does not place processes in separate process groups, and forwards SIGTERM.

Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

That's a problem of the user that sent it to the process group; it's not samurai's problem. Generally I don't think that it would be an issue, because SIGTERM will either exit on the first or trigger orderly cleanup in the child. In the latter case it would be triggered twice but, because signal handlers in general don't do much work themselves, it should be safe to consider SIGTERM idempotent; and if they're not, that should be considered a bug in the program.

bonzini · 2025-01-31T08:53:30Z

Any news?

michaelforney · 2025-02-01T09:43:39Z

Thanks for the ping. I'm happy with this, but would like to fix the signal race before merging. Sorry for the delay in reviewing. I appreciate your patience.

I'm worried a race condition where the signal arrives outside of the poll. If this happens, then we won't forward the signal to the subprocesses until the next one produces output or finishes. I think writing to a self-pipe in the handler and adding that to the pollfd array is probably the simplest way to solve this.

That is caught by:
                if (!have_work() && !numjobs)
                        break;
but indeed there is a microscopic window between this line and the poll() right below. I can fix it once we agree on what to do.

I think the window is bigger than you suggest, but due to the poll timeout, it hangs for at most 5 seconds before sending the signal to jobs and exiting itself.

For example, say we are reading some input for the last job in the array when the signal arrives. We finish the jobwork loop and continue in the outer loop. have_work() returns false, so we don't start any more jobs, but there are still jobs running so we start the poll. If the jobs don't produce more output or exit, then the poll only returns when it times out, and only then do we kill the running jobs.

I was able to reproduce this pretty reliably with a job that wrote to stdout and then sent SIGTERM to samu:

[1/1] exec ./testjob
strace: Process 6572 attached
[pid  6571] 01:34:45 read(7, "", 4)     = 0
[pid  6571] 01:34:45 poll([{fd=5, events=POLLIN}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 8, 5000 <unfinished ...>
[pid  6572] 01:34:45 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid  6571] 01:34:45 <... poll resumed>) = 1 ([{fd=5, revents=POLLIN}])
[pid  6572] 01:34:45 kill(6571, SIGTERM <unfinished ...>
[pid  6571] 01:34:45 read(5,  <unfinished ...>
[pid  6572] 01:34:45 <... kill resumed>) = 0
[pid  6571] 01:34:45 <... read resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid  6571] 01:34:45 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=6572, si_uid=1000} ---
[pid  6571] 01:34:45 poll([{fd=5, events=POLLIN}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 8, 5000) = 0 (Timeout)
[pid  6571] 01:34:50 kill(6572, SIGTERM) = 0
[pid  6572] 01:34:50 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=6571, si_uid=1000} ---
[pid  6572] 01:34:50 +++ killed by SIGTERM +++
01:34:50 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=6572, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
samu: interrupted by signal 15
01:34:50 +++ exited with 1 +++

I think the cleanest solution is to write a byte to a pipe in the handler, and add the read end to the pollfd array.

If SIGTERM is sent to ninja only, what happens? Does the subprocess remain running? I also wonder if whoever sent the SIGTERM ought to have sent it to samurai's process group.

ninja forwards the signal to the non-console processes (using process groups):
    if (!(*i)->use_console_)
      kill(-(*i)->pid_, interrupted_);
But I think for SIGTERM it should send it to console processes as well. Unlike SIGINT and SIGHUP, which are sent by the OS, SIGTERM is usually sent by the user with kill, and you cannot assume that the user sent it to the process group. In fact, I'd argue that because the idea of SIGTERM is to let the process clean up after itself, 1) it should not be sent to a process group, 2) it is a bug to not trap it if you spawn processes.

The behavior of moving the children in their process group was implemented for ninja-build/ninja#110 with no particular explanation; then it was changed to move the process into a session (ninja-build/ninja#909) and reverted (ninja-build/ninja#1097, but see also ninja-build/ninja#1001). Frankly I wouldn't take it as a good example even if samurai is a "ninja clone".

Make instead does the same as my implementation: it does not place processes in separate process groups, and forwards SIGTERM.

Thanks for doing this investigation. I agree with your analysis.

Similarly, if SIGTERM is sent to samurai's process group, then I think the subprocesses will end up seeing SIGTERM twice (once from the initial signal, once from samurai).

That's a problem of the user that sent it to the process group; it's not samurai's problem. Generally I don't think that it would be an issue, because SIGTERM will either exit on the first or trigger orderly cleanup in the child. In the latter case it would be triggered twice but, because signal handlers in general don't do much work themselves, it should be safe to consider SIGTERM idempotent; and if they're not, that should be considered a bug in the program.

Makes sense to me.

build.c

michaelforney · 2025-02-01T10:14:13Z

Regarding job[i].edge, my preference is just to leave it be. I'm not sure it makes sense to partially reinitialize the job struct (why edge and not cmd, pid, and fd)? In fact, I think j->buf.len = 0; probably should be moved to jobstart() where the rest of the struct is initialized.

bonzini · 2025-02-02T09:13:35Z

I think the window is bigger than you suggest, but due to the poll timeout, it hangs for at most 5 seconds before sending the signal to jobs and exiting itself.

Yeah, you're right. I will fix it (in fact most of the logic is already in the jobserver patches...)

This was referenced May 3, 2024

Support for GNU Make jobserver (alternative implementation) #104

Open

Support for GNU Make jobserver #94

Closed

bonzini force-pushed the sigterm branch from 29708c1 to 188b399 Compare May 3, 2024 10:37

michaelforney reviewed Dec 6, 2024

View reviewed changes

build.c Show resolved Hide resolved

michaelforney reviewed Dec 6, 2024

View reviewed changes

build.c Show resolved Hide resolved

bonzini added 2 commits December 7, 2024 01:36

introduce jobkill()

dcd77bd

Extract it from jobwork() so that build() can call it on a signal. Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the sigterm branch from 188b399 to abf0636 Compare December 7, 2024 14:37

michaelforney reviewed Feb 1, 2025

View reviewed changes

build.c Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kill all children on SIGTERM #106

kill all children on SIGTERM #106

bonzini commented May 3, 2024

bonzini commented May 21, 2024

michaelforney left a comment

michaelforney commented Dec 17, 2024

bonzini commented Dec 17, 2024

bonzini commented Jan 31, 2025

michaelforney commented Feb 1, 2025

michaelforney commented Feb 1, 2025

bonzini commented Feb 2, 2025

kill all children on SIGTERM #106

Are you sure you want to change the base?

kill all children on SIGTERM #106

Conversation

bonzini commented May 3, 2024

bonzini commented May 21, 2024

michaelforney left a comment

Choose a reason for hiding this comment

michaelforney commented Dec 17, 2024

bonzini commented Dec 17, 2024

bonzini commented Jan 31, 2025

michaelforney commented Feb 1, 2025

michaelforney commented Feb 1, 2025

bonzini commented Feb 2, 2025