Peer rtt unreasonably large #17837
Labels
area/observability
area/raft
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
type/bug
Bug report criteria
What happened?
this is a reopening of #11100
the ROUND_TRIPPER_RAFT_MESSAGE probing opens a new connection at each probe and therefore is not really computing the RTT
What did you expect to happen?
the ROUND_TRIPPER_RAFT_MESSAGE probing should happen on an existing connection.
etcd_network_peer_round_trip_time_seconds metrics should reflect the actual RTT
as per
https://etcd.io/docs/v3.5/op-guide/performance/
this is not what is read in etcd_network_peer_round_trip_time_seconds
How can we reproduce it (as minimally and precisely as possible)?
run a cluster
tcpdump port 2380 and 'tcp[tcpflags] & tcp-syn == tcp-syn'
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
also reproduced in OpenShift 4.14
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
The text was updated successfully, but these errors were encountered: