Peer rtt unreasonably large #17837

freedge · 2024-04-22T07:19:25Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

this is a reopening of #11100

the ROUND_TRIPPER_RAFT_MESSAGE probing opens a new connection at each probe and therefore is not really computing the RTT

What did you expect to happen?

the ROUND_TRIPPER_RAFT_MESSAGE probing should happen on an existing connection.
etcd_network_peer_round_trip_time_seconds metrics should reflect the actual RTT

as per
https://etcd.io/docs/v3.5/op-guide/performance/

The RTT within a datacenter may be as long as several hundred microseconds.

this is not what is read in etcd_network_peer_round_trip_time_seconds

How can we reproduce it (as minimally and precisely as possible)?

run a cluster

tcpdump port 2380 and 'tcp[tcpflags] & tcp-syn == tcp-syn'

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 2674f94c
Go Version: go1.22.1 (Red Hat 1.22.1-1.el9)
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.6.0-alpha.0
API version: 3.6

Etcd configuration (command line flags or environment variables)

in local: ``` etcd --name infra0 --initial-advertise-peer-urls http://127.0.0.10:2380 \ --listen-peer-urls http://127.0.0.10:2380 \ --listen-client-urls http://127.0.0.10:2379,http://127.0.0.1:2379 \ --advertise-client-urls http://127.0.0.10:2379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster infra0=http://127.0.0.10:2380,infra1=http://127.0.0.11:2380,infra2=http://127.0.0.12:2380 \ --initial-cluster-state new \ --log-level debug --log-outputs stdout ```

also reproduced in OpenShift 4.14

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

freedge added the type/bug label Apr 22, 2024

jmhbnz added area/raft priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/observability labels May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peer rtt unreasonably large #17837

Peer rtt unreasonably large #17837

freedge commented Apr 22, 2024

Peer rtt unreasonably large #17837

Peer rtt unreasonably large #17837

Comments

freedge commented Apr 22, 2024

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output