Skip to content

Commit

Permalink
pageserver: set SO_KEEPALIVE on the page service socket (#10992)
Browse files Browse the repository at this point in the history
## Problem

If the client connection goes dead without an explicit close (e.g. due
to network infrastructure dropping the connection) then we currently
won't detect it for a long time, which may e.g. block GetPage flushes
and keep the task running.

Touches neondatabase/cloud#23515.

## Summary of changes

Enable `SO_KEEPALIVE` on the page service socket, to enable periodic TCP
keepalive probes. These are configured via Linux sysctls, which will be
deployed separately. By default, the first probe is sent after 2 hours,
so this doesn't have a practical effect until we change the sysctls.
  • Loading branch information
erikgrinaker authored Feb 26, 2025
1 parent 01581f3 commit 86b9703
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions pageserver/src/bin/pageserver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ use camino::Utf8Path;
use clap::{Arg, ArgAction, Command};
use metrics::launch_timestamp::{LaunchTimestamp, set_launch_timestamp_metric};
use metrics::set_build_info_metric;
use nix::sys::socket::{setsockopt, sockopt};
use pageserver::config::{PageServerConf, PageserverIdentity};
use pageserver::controller_upcall_client::ControllerUpcallClient;
use pageserver::deletion_queue::DeletionQueue;
Expand Down Expand Up @@ -347,6 +348,13 @@ fn start_pageserver(
info!("Starting pageserver pg protocol handler on {pg_addr}");
let pageserver_listener = tcp_listener::bind(pg_addr)?;

// Enable SO_KEEPALIVE on the socket, to detect dead connections faster.
// These are configured via net.ipv4.tcp_keepalive_* sysctls.
//
// TODO: also set this on the walreceiver socket, but tokio-postgres doesn't
// support enabling keepalives while using the default OS sysctls.
setsockopt(&pageserver_listener, sockopt::KeepAlive, &true)?;

// Launch broker client
// The storage_broker::connect call needs to happen inside a tokio runtime thread.
let broker_client = WALRECEIVER_RUNTIME
Expand Down

1 comment on commit 86b9703

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7896 tests run: 7506 passed, 0 failed, 390 skipped (full report)


Flaky tests (3)

Postgres 17

Code coverage* (full report)

  • functions: 32.8% (8638 of 26347 functions)
  • lines: 48.6% (73111 of 150346 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
86b9703 at 2025-02-26T17:47:14.717Z :recycle:

Please sign in to comment.