7773: io: always cleanup AsyncFd registration list on deregister#47
7773: io: always cleanup AsyncFd registration list on deregister#47martin-augment wants to merge 34 commits intomasterfrom
Conversation
Fixes memory leak when fd is closed before AsyncFd drop. Fixes: tokio-rs#7563
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Instead of checking absolute RSS growth (which varies with allocator behavior), this test now runs multiple phases and checks if memory stabilizes. A real leak causes unbounded growth across all phases; fixed code stabilizes as memory is reused. This approach is more robust across different CI environments where allocator behavior may differ.
Co-authored-by: Alice Ryhl <aliceryhl@google.com>
WalkthroughThis PR modifies the IO driver's deregistration logic and introduces a memory leak regression test. In the IO driver, the ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request ReviewSummaryThis PR fixes a critical memory leak in AsyncFd (issue tokio-rs#7563) by ensuring that the internal registration list cleanup always occurs during deregistration, even when the OS-level deregister operation fails. Code Quality & Best Practices ✅driver.rs changes (tokio/src/runtime/io/driver.rs:298-311):
The change from: self.registry.deregister(source)?; // Early return on error - BAD!
// cleanup code hereTo: let os_result = self.registry.deregister(source); // Continue even on error
// cleanup code here
os_result // Return error after cleanupThis is the correct pattern for cleanup operations and follows established best practices in systems programming. Bug Analysis ✅Root Cause: When a file descriptor was closed before AsyncFd::drop() was called, the OS-level deregister would fail (EBADF), causing an early return via the
Impact: Each leaked registration is ~256 bytes of ScheduledIo metadata, causing unbounded memory growth in applications that repeatedly close FDs before dropping AsyncFd instances. Fix correctness: The fix ensures cleanup happens regardless of OS errors, which is correct because:
Test Coverage ✅The test file Strengths:
Test methodology:
Minor suggestions:
Example: // Allow 64KB tolerance for normal allocation variance.
// A real leak would add ~250KB per phase (1000 * ~256 bytes per ScheduledIo)
let threshold = 64 * 1024;Performance Considerations ✅No negative impact: The change actually improves performance slightly:
Metrics accuracy: The fix ensures Security Concerns ✅Resource exhaustion: This fix resolves a security concern. The memory leak could be exploited for DoS attacks:
No new vulnerabilities introduced: The change is minimal and only affects cleanup ordering. Additional ObservationsGood practices demonstrated:
Questions:
RecommendationAPPROVE ✅ This is a well-crafted fix for a serious bug:
The only minor suggestion is to clarify the issue number reference and consider adding a brief comment in the test about the threshold calculation. Great work on both the fix and the comprehensive test! |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tokio/tests/io_async_fd_memory_leak.rs (1)
76-81: Add error checking forfcntlcalls.The function ignores potential
fcntlfailures. IfF_GETFLfails,flagswill be-1, and the subsequentF_SETFLwill set garbage flags. Consider adding error checks or using the nix-based approach fromio_async_fd.rswhich includes proper error handling.🔎 Proposed fix with error checking
fn set_nonblocking(fd: RawFd) { unsafe { let flags = libc::fcntl(fd, libc::F_GETFL); + assert!(flags != -1, "fcntl(F_GETFL) failed"); - libc::fcntl(fd, libc::F_SETFL, flags | libc::O_NONBLOCK); + let ret = libc::fcntl(fd, libc::F_SETFL, flags | libc::O_NONBLOCK); + assert!(ret != -1, "fcntl(F_SETFL) failed"); } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tokio/src/runtime/io/driver.rstokio/tests/io_async_fd_memory_leak.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T09:48:44.541Z
Learnt from: martin-augment
Repo: martin-augment/tokio PR: 11
File: tokio/src/util/usdt/stapsdt_aarch64.rs:1-3
Timestamp: 2025-11-04T09:48:44.541Z
Learning: In USDT (User Statically-Defined Tracing) probe implementations in Rust, semaphores must be declared as `static mut` primitive types (e.g., `static mut __usdt_sema_tokio_task__details: u16`), not atomic types. This is required for ABI compatibility with external tracing tools like DTrace and SystemTap, which modify these semaphores directly. The inline assembly code references the raw address of these semaphores, and using Rust atomic types would break the USDT mechanism.
Applied to files:
tokio/tests/io_async_fd_memory_leak.rs
🧬 Code graph analysis (1)
tokio/tests/io_async_fd_memory_leak.rs (1)
tokio/tests/io_async_fd.rs (6)
new(41-48)as_raw_fd(65-67)as_raw_fd(244-246)as_raw_fd(935-937)set_nonblocking(102-118)socketpair(120-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: claude-review
🔇 Additional comments (5)
tokio/src/runtime/io/driver.rs (1)
298-311: LGTM! Proper fix for the memory leak.The change correctly ensures that internal cleanup (removing registration from the set, updating metrics, and potentially unparking) always executes regardless of whether the OS-level deregistration succeeds. This is essential because when a file descriptor is closed before
AsyncFdis dropped,registry.deregister()fails withEBADF, but theScheduledIoregistration still needs to be cleaned up to prevent leaking memory.tokio/tests/io_async_fd_memory_leak.rs (4)
16-42: LGTM! Allocator tracking implementation is correct.The
alloc,dealloc, andreallocimplementations correctly track the allocation delta. Notably, whenreallocfails (returns null), not updating the counter is correct since the original allocation remains valid.Ordering::Relaxedis acceptable here since exact precision isn't required for the 64KB threshold check.
100-112: LGTM! The fd lifecycle pattern correctly simulates the bug scenario.The pattern of
forget(fd_a)followed by manuallibc::close()before droppingAsyncFdcorrectly reproduces the issue tokio-rs#7563 scenario where an external entity closes the file descriptor beforeAsyncFdis dropped. This is the exact condition that triggers the memory leak this test is designed to catch.
173-187: LGTM! Sound test logic for detecting memory leaks.The assertion correctly identifies leaks by checking that at least one phase shows minimal growth (below 64KB). If memory is being properly reused, growth should stabilize. The 64KB threshold is appropriate given the ~250KB growth expected per 1000 iterations when leaking (~256 bytes per
ScheduledIo).
58-74: LGTM!The wrapper structs follow the established pattern from
io_async_fd.rsand correctly implementAsRawFddelegation for use withAsyncFd.
🤖 Augment PR SummarySummary: This PR fixes an internal I/O-driver cleanup path to prevent Changes:
Technical Notes: The test avoids RSS-based checks (which can be distorted by allocator retention) by measuring outstanding allocation sizes directly via a custom 🤖 Was this summary useful? React with 👍 or 👎 |
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The F_GETFL return value should be checked for not being a negative one before using it to set the new flags. Prevents ignoring a syscall failure and continuing. |
7773: To review by AI