-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upticks of S3 timeout errors after SDK upgrade #1130
Comments
Hey @xizhem, thanks for submitting this issue. I've added it to our backlog. |
To clarify, the SDK does not alter the keep-alive in the header, so this must've been happening somewhere else. |
Does SDK uses HTTP1 or 2? In hyper documentation, keep-alive looks like enabled by default? https://docs.rs/hyper/latest/hyper/server/conn/http1/struct.Builder.html#method.keep_alive |
SDK uses hyper defaults, which is HTTP/1.1 and negotiates to HTTP/2 if the servers wishes to do so.
It is. |
Asked the reproducer offline on 4/17 and waiting for reply. Adding a label as such. |
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one. |
Describe the bug
Recently after SDK upgrade to aws-s3-sdk(>1.14), we notice upward trend of S3 GET timeout errors in production. We already ruled out the issue from #1118 . In our case, the error message is
TransientError
due to hitting attempt timeout.There are correlation with connection timeout setting with the number of errors we've seen.
There are also correlation with the load that we send to S3 to the number of errors.
Expected Behavior
Our timeout setting is as follow:
connection timeout: default to
3.1
attempt timeout:
800ms
operation timeout:
2.6s
total attempts:
3
We expect S3 request to success during this 2.6s.
Current Behavior
SDK did retry 3 times as we check. But still, we timeout after 3 attempts exhausted.
Smithy orchestrator typically emit halting line before the
TransientError
. We couldn't tell whether connection is established successfully within these 800ms or not, as there is no identifier between hyper logs vs. SDK logs.Reproduction Steps
We ran load test to benchmark S3 client and we found correlation between connection timeout and Timeout errors. The load test is running at max possible of 200 concurrency of S3 gets.
Possible Solution
By using linux ss command to observer socket overview. I found that connection created by SDK client to S3 does not have keep alive header. Note
3.5.87.213:https
is s3 host as I check from herewhereas a typical connection could look like this, notice the keep alive header:
I suspect this issue is due to inefficient usage of connection reuse down at
hyper
layer, i.e. previous active connection are closed by S3 randomly due to the lack of header. But I could be wrong.We also observe this log line appears consistently before the transient error
Additional Information/Context
No response
Version
The text was updated successfully, but these errors were encountered: