Skip to content

netmon: init#214

Open
npry wants to merge 2 commits into
mainfrom
npry/netmon
Open

netmon: init#214
npry wants to merge 2 commits into
mainfrom
npry/netmon

Conversation

@npry
Copy link
Copy Markdown
Collaborator

@npry npry commented May 28, 2026

Write crate ts_netmon, which is organized around the Netmon trait, whose primary job is to return a Stream<Item = ts_netmon::Event>. The events are upserts and removals for routes, addresses, and interfaces, as well as the selection of a new "default route interface" (the interface that has the best-metric default route). The default route interface is the one we expect to use with STUN and NAT traversal.

Implementations are provided for Windows and Linux (macOS is stubbed pending me getting shipped a laptop to work on from IT). There is an actor that runs Netmon instances configured to run the platform Netmon if one exists, and aggregates the incoming Events to produce a State. Nothing subscribes to this actor right now; the direct UDP actor will do this to interpret what address(es) an unspecified (0.0.0.0/::) socket bind actually means (and then the port it bound on can be combined to report endpoints to control).

Closes #219

Windows quirks

consistency

The Windows Netmon impl aggregates 3 separate streams of events (for routes, addresses, and interfaces respectively) into the one Event stream. These are not synchronized with each other by the Win32 platform; they're serviced by callbacks in automatically spawned threads behind the veil of the platform layer. As a result, there are no ordering guarantees (see ts_netmon/examples/win32_raw_monitor.rs for an example that proves this), but we do expect to get all events for all resources (by contrast to rtnetlink, which skips deleting some link-associated resources when it deletes the link).

This behavior is indicated by a method on Netmon and resolved by accepting that our view of interface state may be briefly inconsistent (which is true anyway, since AF_NETLINK and AF_ROUTE sockets deliver messages serially, even if they arrive in a batch). This is optimistically mitigated by time-bucketing Events in the actor in increments of 100ms. Empirically, this tends to catch the whole batch of updates from interface creation, deletion, or state change unless you're doing something like joining or leaving the corp tailnet.

interface address uniqueness

Windows also treats the address as unique on a network interface, e.g. the address bits for 1.2.3.4 can only appear once on a given network interface, even if you wanted to have multiple addresses with different netmasks. Linux permits this, i.e. you can ip a add 1.2.3.4/24 and then ip a add 1.2.3.4/20 and both addresses appear distinctly and can be deleted independently. Whereas in Windows, you can mutate the prefix bits in-place because the address bits are the unique identifier. So we need different tracking behavior between the two platforms; this is implemented in the NetlinkActor state aggregation logic and indicated by a Netmon method.

@npry npry force-pushed the npry/netmon branch 8 times, most recently from 64e62b0 to 97e4a8e Compare June 1, 2026 19:40
@npry npry marked this pull request as ready for review June 1, 2026 20:21
@npry
Copy link
Copy Markdown
Collaborator Author

npry commented Jun 1, 2026

@nrc I'd appreciate your review on this one, mostly re: the windows module for its use of unsafe. A bunch of the usage is just that the windows crate exposes most functions as unsafe without documentation or qualification, which I don't think I need commentary on unless you think I'm obviously holding it wrong — more what I'm looking for is another set of eyes on the lifetimes, unions, casting, etc. that have non-"yes I am holding this correctly" SAFETY comments

@npry npry force-pushed the npry/netmon branch 2 times, most recently from fe071ce to 8da9164 Compare June 2, 2026 17:09
npry added 2 commits June 2, 2026 13:37
Signed-off-by: Nathan Perry <nathan@tailscale.com>
Change-Id: I9bad19f2165bf277c78df5127a00aa426a6a6964
Signed-off-by: Nathan Perry <nathan@tailscale.com>
Change-Id: I92d5ec0dd9d445752ece4808df09124a6a6a6964
Copy link
Copy Markdown
Contributor

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, about how the safety may be slightly better expressed, but otherwise the unsafe code seems fine. I didn't review for functionality, just safety.

type Target = [IpHelper::MIB_IPFORWARD_ROW2];

fn deref(&self) -> &Self::Target {
// SAFETY: can only get here if the kernel told us NO_ERROR, checked non-null.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is due to the call to IpHelper::GetIpForwardTable2 in get and the possible early return? Does IpHelper::GetIpForwardTable2 guarantee tab will be non-null? I couldn't find that in the Rust docs. Given the check in the dtor, it seems like there should be a similar check here. If not, it would be better to use NonNull for the type of tab.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps add a comment about the NO_ERROR part being an invariant of RouteTable and a postcondition of get


impl DerefMut for RouteTable {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing safety comment and possibly null check


/// Iterator over a Windows linked-list-of-addresses type.
pub struct AddrIter<'a> {
item: Option<*const dyn WinAddr>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a &'a reference? I think that would localise the unsafety a bit better

crate::Interface {
id: value.Luid.into(),
// SAFETY: kernel handed us this value
name: unsafe { value.FriendlyName.to_string() }.unwrap(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it impossible to get a utf8 error here? I don't trust Windows, but I guess it should be valid, but then I don't see why to_string would return a Result rather than just a String


/// Iterator over an [`InterfaceReport`].
pub struct ReportIter<'a> {
item: Option<*const IpHelper::IP_ADAPTER_ADDRESSES_LH>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think using a reference would be better here

let result = Foundation::WIN32_ERROR(result).ok();

match result {
Ok(()) => {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this break?

self.item = if ret.Next.is_null() {
None
} else {
Some(ret.Next as *const _)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the lifetimes to be correct here, Next must always point into the same buffer. Is that guaranteed by the kernel? I would state that explicitly on line 98 (or here if you use a reference instead of a raw pointer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

network monitor

2 participants