Skip to content

Moving timeout handling in eBPF #17

@fracappa

Description

@fracappa

This framework currently cleans up stale socket (stuck in CLOSE-WAIT state with a timeout setup by the user (defaults to 60s).

There is a portion of the BPF side that already removes candidate sockets for deletion if those exit the CLOSE-WAIT condition timely, shown in the code below:

 if(oldstate == TCP_CLOSE_WAIT && newstate != TCP_CLOSE_WAIT) {
        bpf_map_delete_elem(&close_wait_tracker, &key); 
    }

At the same time, the user-space application also cleans up the sockets if those times out as specified in the code below:

if age > timeoutNs {
			log.Printf("Stale CLOSE_WAIT: %s:%d -> %s:%d (age=%v, netns=%s)",
				socket.FormatIP(key.SrcIp), socket.Ntohs(key.SrcPort),
				socket.FormatIP(key.DstIp), socket.Ntohs(key.DstPort),
				time.Duration(age), netns.GetNameByIno(info.NetnsIno))

			err := socket.DestroySocketNetnsIno(
				info.NetnsIno,
				key.Proto,
				key.SrcIp, key.SrcPort,
				key.DstIp, key.DstPort,
			)
   ...
 }

This multiple deletion approach could cause race conditions, potentially hard to handle as there are no synchronization primitives between kernel and user space.

For this reason, a different approach would be worth it to be explored.

The idea is to move the timeout handling also on the BPF side, notifying the user-space possibly in an efficient way (e.g. using perf events) to let it performing the actual cleanup.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions