-
Notifications
You must be signed in to change notification settings - Fork 124
blockdev: Fix loopback device resource leak on signal interruption #1402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a signal-safe cleanup mechanism for loopback devices to prevent resource leaks on signal interruption. It uses an out-of-process helper to clean up leaked loopback devices. I've added comments to enhance error logging for better debugging.
blockdev/src/blockdev.rs
Outdated
if libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut()) != 0 { | ||
std::process::exit(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pthread_sigmask
function can fail, but the error is not being handled beyond exiting the process. It would be better to log the error message using tracing::error!
before exiting, to aid in debugging.
if libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut()) != 0 {
let err = std::io::Error::last_os_error();
tracing::error!("pthread_sigmask failed: {}", err);
std::process::exit(1);
}
blockdev/src/blockdev.rs
Outdated
let result = unsafe { | ||
libc::sigwaitinfo(&sigset, &mut siginfo) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sigwaitinfo
function can fail. It would be better to log the error message using tracing::error!
before exiting, to aid in debugging.
let result = unsafe {
let result = libc::sigwaitinfo(&sigset, &mut siginfo);
if result == -1 {
let err = std::io::Error::last_os_error();
tracing::error!("sigwaitinfo failed: {}", err);
std::process::exit(1);
}
result
};
blockdev/src/blockdev.rs
Outdated
unsafe { | ||
libc::kill(cleanup_handle.helper_pid as i32, libc::SIGTERM); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockdev/src/blockdev.rs
Outdated
// Wait for it to exit (non-blocking) | ||
unsafe { | ||
let mut status = 0; | ||
libc::waitpid(cleanup_handle.helper_pid as i32, &mut status, libc::WNOHANG); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The waitpid
function can fail. It would be better to log the error message using tracing::warn!
if waitpid
fails, to aid in debugging.
unsafe {
let mut status = 0;
if libc::waitpid(cleanup_handle.helper_pid as i32, &mut status, libc::WNOHANG) == -1 {
let err = std::io::Error::last_os_error();
tracing::warn!("waitpid failed: {}", err);
}
}
70c2b77
to
a4ab303
Compare
Thanks for working on this! While it will be a bit more awkward can you try doing it this way #799 (comment) - that should 100% avoid all the unsafe code. Basically instead of a raw |
Add fork+exec based cleanup helper to prevent loopback device leaks when bootc install --via-loopback is interrupted by signals like SIGINT. - Add loopback-cleanup-helper CLI subcommand - Implement run_loopback_cleanup_helper() with PR_SET_PDEATHSIG - Update LoopbackDevice to spawn cleanup helper process - Add tests for spawn mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looking closer
anyhow::bail!("This function should only be called as a cleanup helper"); | ||
} | ||
|
||
// Close stdin, stdout, stderr and redirect to /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better done in the parent process setup above
.context("Failed to read /proc/self/exe")?; | ||
|
||
// Create the helper process using exec | ||
let mut cmd = Command::new(self_exe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set up std{in,out,err} here via https://doc.rust-lang.org/std/process/struct.Stdio.html#method.null
} | ||
|
||
// Set up death signal notification - we want to be notified when parent dies | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a safe version of this in https://docs.rs/rustix/latest/rustix/process/fn.set_parent_process_death_signal.html
|
||
// Set up death signal notification - we want to be notified when parent dies | ||
unsafe { | ||
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGUSR1) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems cleaner to me to use SIGTERM
and react to that
} | ||
} | ||
|
||
// Mask most signals to avoid being killed accidentally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So https://docs.rs/tokio/latest/tokio/signal/index.html is one way to handle this in a safe way (will require making the function async
)
I think the tokio API will replace about 50 lines of unsafe code with 5 lines of safe code.
|
||
match status { | ||
Ok(exit_status) if exit_status.success() => { | ||
// Write to stderr since we closed stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm my vote here is probably to not inherit stderr at all; if we write to stderr then we have the possibility to intermix the child process writes with the parent's.
One option is to explicitly log to the systemd journal.
I guess speaking of systemd...a whole possibility I hadn't considered until just now is that we fork off via systemd-run
. That would have some nice advantages but will be trickier to get right the lifecycle binding, so let's leave that for the future.
(There's this whole giant topic in bootc overall defaulting to running via systemd in some cases, which would similarly have a lot of advantages but be a big nontrivial change)
20961247 | ||
); | ||
Ok(()) | ||
let data = fs::read_to_string("tests/fixtures/lsblk.json").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks unrelated? I mean it's probably fine to do but let's break it into a separate commit
This commit implements issue #799 by creating a signal-safe cleanup helper for loopback devices to prevent resource leaks when bootc install --via-loopback is interrupted by signals like SIGINT (Ctrl-C).
The solution uses an 'out-of-process drop' helper that:
This prevents the common issue where interrupting bootc install --via-loopback with Ctrl-C would leave /dev/loopN devices allocated on the system.
Fixes: #799