7773: io: always cleanup AsyncFd registration list on deregister by martin-augment · Pull Request #47 · martin-augment/tokio

martin-augment · 2026-01-05T07:17:02Z

7773: To review by AI

Fixes memory leak when fd is closed before AsyncFd drop. Fixes: tokio-rs#7563

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

…c exposure)

Instead of checking absolute RSS growth (which varies with allocator behavior), this test now runs multiple phases and checks if memory stabilizes. A real leak causes unbounded growth across all phases; fixed code stabilizes as memory is reused. This approach is more robust across different CI environments where allocator behavior may differ.

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

coderabbitai · 2026-01-05T07:17:16Z

Walkthrough

This PR modifies the IO driver's deregistration logic and introduces a memory leak regression test. In the IO driver, the deregister_source function changes its error handling pattern to always perform cleanup operations (removing registration, updating metrics, unparking) regardless of OS deregistration success, then return any error. A new test file adds a regression test using a custom global allocator to detect memory leaks when AsyncFd drops after its file descriptor is already closed, measuring heap allocation growth across multiple phases with a 64KB threshold.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-05T07:19:05Z

Pull Request Review

Summary

This PR fixes a critical memory leak in AsyncFd (issue tokio-rs#7563) by ensuring that the internal registration list cleanup always occurs during deregistration, even when the OS-level deregister operation fails.

Code Quality & Best Practices ✅

driver.rs changes (tokio/src/runtime/io/driver.rs:298-311):
The fix is well-implemented and follows the principle of RAII (Resource Acquisition Is Initialization):

Separation of concerns: The change correctly separates OS-level cleanup from internal state cleanup
Clear intent: The comment "Cleanup ALWAYS happens" clearly documents the critical behavior
Proper error handling: The fix maintains the original error semantics by returning the OS result after cleanup

The change from:

self.registry.deregister(source)?;  // Early return on error - BAD!
// cleanup code here

To:

let os_result = self.registry.deregister(source);  // Continue even on error
// cleanup code here
os_result  // Return error after cleanup

This is the correct pattern for cleanup operations and follows established best practices in systems programming.

Bug Analysis ✅

Root Cause: When a file descriptor was closed before AsyncFd::drop() was called, the OS-level deregister would fail (EBADF), causing an early return via the ? operator. This prevented:

Removal from the internal registration list (self.registrations.deregister())
Decrementing the FD count metric (self.metrics.dec_fd_count())

Impact: Each leaked registration is ~256 bytes of ScheduledIo metadata, causing unbounded memory growth in applications that repeatedly close FDs before dropping AsyncFd instances.

Fix correctness: The fix ensures cleanup happens regardless of OS errors, which is correct because:

Internal state should be cleaned up even if the OS resource is already gone
The registration list holds Arc references that need to be released
Metrics need to stay accurate

Test Coverage ✅

The test file io_async_fd_memory_leak.rs is excellent:

Strengths:

Custom allocator tracking: Uses a global allocator wrapper to track actual allocated bytes, avoiding RSS measurement issues
Three-phase approach: Warmup phase, then two measurement phases to detect ongoing growth vs stabilization
Realistic reproduction: Accurately reproduces the bug scenario (closing FD before AsyncFd drop)
Platform-specific: Properly gated with #[cfg(all(unix, target_os = "linux"))]
Well-documented: Clear comments explain the purpose and expected behavior

Test methodology:

Runs 1000 iterations per phase (3000 total)
Compares growth between phases to detect leaks
Allows 64KB tolerance for allocator variance
Fails if memory keeps growing between phases

Minor suggestions:

Test isolation: This is the only test in tokio using #[global_allocator], which is good (avoids conflicts), but consider adding a comment noting this limitation
Const documentation: Consider adding a comment explaining why 64KB threshold was chosen (roughly 250 leaked ScheduledIo structs)

Example:

// Allow 64KB tolerance for normal allocation variance.
// A real leak would add ~250KB per phase (1000 * ~256 bytes per ScheduledIo)
let threshold = 64 * 1024;

Performance Considerations ✅

No negative impact: The change actually improves performance slightly:

Before: Early return meant some cleanup paths were unpredictable
After: All cleanup always happens in a single code path

Metrics accuracy: The fix ensures dec_fd_count() is always called, keeping metrics accurate for monitoring and debugging.

Security Concerns ✅

Resource exhaustion: This fix resolves a security concern. The memory leak could be exploited for DoS attacks:

An attacker could trigger the leak pattern repeatedly
Unbounded memory growth leads to OOM
The fix prevents this attack vector

No new vulnerabilities introduced: The change is minimal and only affects cleanup ordering.

Additional Observations

Good practices demonstrated:

Minimal change: The fix touches only the necessary lines
Preserves behavior: Error semantics remain unchanged for callers
Comments: Added helpful inline comments
Issue reference: Title references issue io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773 (note: description mentions The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563, ensure consistency)

Questions:

Issue number mismatch: PR title mentions io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773 but description mentions The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563 for review. Should the title reference The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563 instead?
Backport consideration: Given this is a memory leak fix, should this be backported to stable branches?

Recommendation

APPROVE ✅

This is a well-crafted fix for a serious bug:

The code change is correct and minimal
The test coverage is excellent and comprehensive
No security or performance concerns
Follows Tokio's patterns and conventions

The only minor suggestion is to clarify the issue number reference and consider adding a brief comment in the test about the threshold calculation.

Great work on both the fix and the comprehensive test!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tokio/tests/io_async_fd_memory_leak.rs (1)
76-81: Add error checking for fcntl calls.

The function ignores potential fcntl failures. If F_GETFL fails, flags will be -1, and the subsequent F_SETFL will set garbage flags. Consider adding error checks or using the nix-based approach from io_async_fd.rs which includes proper error handling.
🔎 Proposed fix with error checking
 fn set_nonblocking(fd: RawFd) {
     unsafe {
         let flags = libc::fcntl(fd, libc::F_GETFL);
+        assert!(flags != -1, "fcntl(F_GETFL) failed");
-        libc::fcntl(fd, libc::F_SETFL, flags | libc::O_NONBLOCK);
+        let ret = libc::fcntl(fd, libc::F_SETFL, flags | libc::O_NONBLOCK);
+        assert!(ret != -1, "fcntl(F_SETFL) failed");
     }
 }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a80b4d3 and fdf771e.

📒 Files selected for processing (2)

tokio/src/runtime/io/driver.rs
tokio/tests/io_async_fd_memory_leak.rs

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-04T09:48:44.541Z

Learnt from: martin-augment
Repo: martin-augment/tokio PR: 11
File: tokio/src/util/usdt/stapsdt_aarch64.rs:1-3
Timestamp: 2025-11-04T09:48:44.541Z
Learning: In USDT (User Statically-Defined Tracing) probe implementations in Rust, semaphores must be declared as `static mut` primitive types (e.g., `static mut __usdt_sema_tokio_task__details: u16`), not atomic types. This is required for ABI compatibility with external tracing tools like DTrace and SystemTap, which modify these semaphores directly. The inline assembly code references the raw address of these semaphores, and using Rust atomic types would break the USDT mechanism.

Applied to files:

tokio/tests/io_async_fd_memory_leak.rs

🧬 Code graph analysis (1)

tokio/tests/io_async_fd_memory_leak.rs (1)

tokio/tests/io_async_fd.rs (6)

new (41-48)

as_raw_fd (65-67)

as_raw_fd (244-246)

as_raw_fd (935-937)

set_nonblocking (102-118)

socketpair (120-136)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: claude-review

🔇 Additional comments (5)

tokio/src/runtime/io/driver.rs (1)

298-311: LGTM! Proper fix for the memory leak.

The change correctly ensures that internal cleanup (removing registration from the set, updating metrics, and potentially unparking) always executes regardless of whether the OS-level deregistration succeeds. This is essential because when a file descriptor is closed before AsyncFd is dropped, registry.deregister() fails with EBADF, but the ScheduledIo registration still needs to be cleaned up to prevent leaking memory.

tokio/tests/io_async_fd_memory_leak.rs (4)

16-42: LGTM! Allocator tracking implementation is correct.

The alloc, dealloc, and realloc implementations correctly track the allocation delta. Notably, when realloc fails (returns null), not updating the counter is correct since the original allocation remains valid. Ordering::Relaxed is acceptable here since exact precision isn't required for the 64KB threshold check.

100-112: LGTM! The fd lifecycle pattern correctly simulates the bug scenario.

The pattern of forget(fd_a) followed by manual libc::close() before dropping AsyncFd correctly reproduces the issue tokio-rs#7563 scenario where an external entity closes the file descriptor before AsyncFd is dropped. This is the exact condition that triggers the memory leak this test is designed to catch.

173-187: LGTM! Sound test logic for detecting memory leaks.

The assertion correctly identifies leaks by checking that at least one phase shows minimal growth (below 64KB). If memory is being properly reused, growth should stabilize. The 64KB threshold is appropriate given the ~250KB growth expected per 1000 iterations when leaking (~256 bytes per ScheduledIo).

58-74: LGTM!

The wrapper structs follow the established pattern from io_async_fd.rs and correctly implement AsRawFd delegation for use with AsyncFd.

augmentcode · 2026-01-05T07:20:51Z

🤖 Augment PR Summary

Summary: This PR fixes an internal I/O-driver cleanup path to prevent AsyncFd-related leaks when deregistration fails.

Changes:

Updates Handle::deregister_source to always remove the ScheduledIo from Tokio’s internal RegistrationSet and decrement FD metrics, even if the OS-level registry.deregister call returns an error.
Preserves the OS deregistration attempt ordering (still invoked first), but defers returning the error until after internal cleanup has run.
Adds a Linux-only regression test that reproduces the historical leak scenario by closing the raw FD before dropping AsyncFd, and verifies allocations stabilize using a tracking global allocator.

Technical Notes: The test avoids RSS-based checks (which can be distorted by allocator retention) by measuring outstanding allocation sizes directly via a custom #[global_allocator].

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

martin-augment · 2026-01-05T07:43:36Z

76-81: Add error checking for fcntl calls.

The function ignores potential fcntl failures. If F_GETFL fails, flags will be -1, and the subsequent F_SETFL will set garbage flags. Consider adding error checks or using the nix-based approach from io_async_fd.rs which includes proper error handling.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The F_GETFL return value should be checked for not being a negative one before using it to set the new flags. Prevents ignoring a syscall failure and continuing.

F4RAN and others added 30 commits December 12, 2025 22:28

io: always cleanup AsyncFd registration list on deregister

931635d

Fixes memory leak when fd is closed before AsyncFd drop. Fixes: tokio-rs#7563

fix: formatter issues

3d7d87d

test: in linux environment

b6452c7

test: linux test with fix

bf5c706

chore: remove additional method

e659d60

chore: remove additional debug comments

b44e56d

fix:formatter

94f46b9

fix: style: fix clippy warnings and format code

0a0f94e

Update tokio/src/runtime/io/registration_set.rs

c4141e3

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Update tokio/src/runtime/io/driver.rs

125dc5d

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Merge branch 'master' into 7563-fix-asyncfd-leak

4d30241

fix: remove additional imports

45d3da2

restore AsyncFd::try_with_interest()

ab770fa

style: run formatter

e36a489

test(internals): gate test-only APIs behind __internal_test (no publi…

0b0495f

…c exposure)

fix: spelling error is solved using backticks

08d5645

fix: rename __internal_test to integration_test

e3969ed

Merge branch 'master' into 7563-fix-asyncfd-leak

580f197

fix: turn from integration_test to tokio_unstable

872097a

Merge branch 'master' into 7563-fix-asyncfd-leak

0fbd65d

test: revert to buggy code

a7895ac

test: lsan only test

87864bf

test: heap profiling

23d5e04

test: revert fix to check the test again

1c1cdd7

fix: rss test is applied and works

23344ce

fix: resolve clippy format

abd4d91

Merge branch 'master' into 7563-fix-asyncfd-leak

3c18df5

Update tokio/src/runtime/io/driver.rs

3eba842

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

fix: additional line in Cargo file

ffe88ce

F4RAN added 4 commits January 5, 2026 09:13

test: add custom allocator memory leak test for issue tokio-rs#7563

b8ef07c

test: revert to check in linux machine

d84af1c

test: fix test

6fa3271

fix: inline format args to satisfy clippy

fdf771e

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

augmentcode bot reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7773: io: always cleanup AsyncFd registration list on deregister#47

7773: io: always cleanup AsyncFd registration list on deregister#47
martin-augment wants to merge 34 commits intomasterfrom
pr-7773-2026-01-05-07-16-58

martin-augment commented Jan 5, 2026

Uh oh!

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

augmentcode bot commented Jan 5, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

martin-augment commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

martin-augment commented Jan 5, 2026

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 5, 2026

Pull Request Review

Summary

Code Quality & Best Practices ✅

Bug Analysis ✅

Test Coverage ✅

Strengths:

Test methodology:

Minor suggestions:

Performance Considerations ✅

Security Concerns ✅

Additional Observations

Good practices demonstrated:

Questions:

Recommendation

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Jan 5, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

coderabbitai bot commented Jan 5, 2026 •

edited

Loading