-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
We’re investigating a recurring TCP RST observed ~2.5 seconds after a gRPC client sends application data (PSH, ACK) on a bidirectional stream, and we’re trying to confirm whether this behavior is expected or a side-effect of the keepalive configuration.
Environment
gRPC-Java version: [1.64.0]
Transport: Netty
Channel configured with:
.keepAliveTime(3, TimeUnit.MINUTES) .keepAliveTimeout(2, TimeUnit.SECONDS) .keepAliveWithoutCalls(true)
Server side allows keepalive and does not appear to terminate connections.
Observed behavior
The client sends an HTTP/2 DATA frame (visible as TCP PSH, ACK).
No further packets are received from the server.
Approximately 2.5 seconds later, the client issues a TCP RST.
This occurs consistently when the server does not reply or acknowledge within that interval.
However, we do not see a ping explicitly sent at the time the RST occurs.
It appears that a timeout due to lack of any inbound data (not necessarily a PING-ACK) may trigger shutdown().
Questions
- Does KeepAliveManager consider only unacknowledged PINGs when starting the keepalive timeout, or anyperiod of read inactivity (including outstanding DATA frames)?
- If no PING was sent yet (because keepAliveTime >> 2 s), can the timeout still trigger a shutdown purely due to read inactivity?
- Could the RST behavior stem from the Netty transport closing the channel immediately when shutdown()fires (e.g., via Channel.close() with SO_LINGER=0)?
- Are there known differences between gRPC-Java and gRPC-C/C++ regarding this shutdown trigger?
Additional context
We’re analyzing this in the context of a long-lived bidirectional streaming RPC.
tcpdump shows the client’s last sent frame is application DATA, not a PING.
We suspect the combination of .keepAliveTimeout(2s) and .keepAliveTime(3min) may result in a “false positive” closure if the server doesn’t respond quickly enough after the last DATA frame.
channel =
NettyChannelBuilder.forAddress(serverHost, serverPort)
.keepAliveTime(180, TimeUnit.SECONDS)
.keepAliveTimeout(2, TimeUnit.SECONDS)
.keepAliveWithoutCalls(true)We have changed (keepAliveTimeout) this to the default (20 sec), and that does see to have an effect on when tcp-retrans occur and RST time.
We’d appreciate clarification or a reference to where in the codebase this distinction (PING ACK vs generic read inactivity) is definitively made.
Thanks for your time and for maintaining gRPC-Java.