Skip to content

blockdev: Fix loopback device resource leak on signal interruption #1402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gursewak1997
Copy link
Contributor

This commit implements issue #799 by creating a signal-safe cleanup helper for loopback devices to prevent resource leaks when bootc install --via-loopback is interrupted by signals like SIGINT (Ctrl-C).

The solution uses an 'out-of-process drop' helper that:

  • Forks a cleanup helper process when creating a loopback device
  • Uses PR_SET_PDEATHSIG to detect when the parent process dies
  • Masks most signals to avoid being killed accidentally
  • Automatically cleans up leaked loopback devices if the parent dies
  • Gracefully terminates when the parent performs normal cleanup

This prevents the common issue where interrupting bootc install --via-loopback with Ctrl-C would leave /dev/loopN devices allocated on the system.

Fixes: #799

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a signal-safe cleanup mechanism for loopback devices to prevent resource leaks on signal interruption. It uses an out-of-process helper to clean up leaked loopback devices. I've added comments to enhance error logging for better debugging.

Comment on lines 299 to 303
if libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut()) != 0 {
std::process::exit(1);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pthread_sigmask function can fail, but the error is not being handled beyond exiting the process. It would be better to log the error message using tracing::error! before exiting, to aid in debugging.

            if libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut()) != 0 {
                let err = std::io::Error::last_os_error();
                tracing::error!("pthread_sigmask failed: {}", err);
                std::process::exit(1);
            }

Comment on lines 317 to 327
let result = unsafe {
libc::sigwaitinfo(&sigset, &mut siginfo)
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sigwaitinfo function can fail. It would be better to log the error message using tracing::error! before exiting, to aid in debugging.

        let result = unsafe { 
            let result = libc::sigwaitinfo(&sigset, &mut siginfo);
            if result == -1 {
                let err = std::io::Error::last_os_error();
                tracing::error!("sigwaitinfo failed: {}", err);
                std::process::exit(1);
            }
            result
        };

Comment on lines 373 to 395
unsafe {
libc::kill(cleanup_handle.helper_pid as i32, libc::SIGTERM);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The kill function can fail. It would be better to log the error message using tracing::warn! if kill fails, to aid in debugging.

            unsafe {
                if libc::kill(cleanup_handle.helper_pid as i32, libc::SIGTERM) != 0 {
                    let err = std::io::Error::last_os_error();
                    tracing::warn!("kill failed: {}", err);
                }
            }

Comment on lines 376 to 404
// Wait for it to exit (non-blocking)
unsafe {
let mut status = 0;
libc::waitpid(cleanup_handle.helper_pid as i32, &mut status, libc::WNOHANG);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The waitpid function can fail. It would be better to log the error message using tracing::warn! if waitpid fails, to aid in debugging.

            unsafe {
                let mut status = 0;
                if libc::waitpid(cleanup_handle.helper_pid as i32, &mut status, libc::WNOHANG) == -1 {
                    let err = std::io::Error::last_os_error();
                    tracing::warn!("waitpid failed: {}", err);
                }
            }

@gursewak1997 gursewak1997 force-pushed the bootc-799 branch 6 times, most recently from 70c2b77 to a4ab303 Compare July 11, 2025 05:59
@cgwalters
Copy link
Collaborator

Thanks for working on this! While it will be a bit more awkward can you try doing it this way #799 (comment) - that should 100% avoid all the unsafe code.

Basically instead of a raw fork() (source of basically all the unsafe in general) we fork+exec our own binary /proc/self/exe. Look at e.g. reexec_with_guardenv.

Add fork+exec based cleanup helper to prevent loopback device leaks when
bootc install --via-loopback is interrupted by signals like SIGINT.

- Add loopback-cleanup-helper CLI subcommand
- Implement run_loopback_cleanup_helper() with PR_SET_PDEATHSIG
- Update LoopbackDevice to spawn cleanup helper process
- Add tests for spawn mechanism
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looking closer

anyhow::bail!("This function should only be called as a cleanup helper");
}

// Close stdin, stdout, stderr and redirect to /dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better done in the parent process setup above

.context("Failed to read /proc/self/exe")?;

// Create the helper process using exec
let mut cmd = Command::new(self_exe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// Set up death signal notification - we want to be notified when parent dies
unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Set up death signal notification - we want to be notified when parent dies
unsafe {
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGUSR1) != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems cleaner to me to use SIGTERM and react to that

}
}

// Mask most signals to avoid being killed accidentally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So https://docs.rs/tokio/latest/tokio/signal/index.html is one way to handle this in a safe way (will require making the function async)

I think the tokio API will replace about 50 lines of unsafe code with 5 lines of safe code.


match status {
Ok(exit_status) if exit_status.success() => {
// Write to stderr since we closed stdout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm my vote here is probably to not inherit stderr at all; if we write to stderr then we have the possibility to intermix the child process writes with the parent's.

One option is to explicitly log to the systemd journal.

I guess speaking of systemd...a whole possibility I hadn't considered until just now is that we fork off via systemd-run. That would have some nice advantages but will be trickier to get right the lifecycle binding, so let's leave that for the future.

(There's this whole giant topic in bootc overall defaulting to running via systemd in some cases, which would similarly have a lot of advantages but be a big nontrivial change)

20961247
);
Ok(())
let data = fs::read_to_string("tests/fixtures/lsblk.json").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated? I mean it's probably fine to do but let's break it into a separate commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop loopback out-of-process
2 participants