-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call to KDS 'put_records' fails intermittently with 'Connection reset by Peer' within lambda extension #1106
Comments
is this happening during running the lambda locally or running it on a real lambda function? There are apparently some issues with the local simulator. |
Hey, thanks for the reply. This is happening with the extension deployed to AWS and attached to a lambda running nodejs |
Updating, I'm getting another error that seems potentially related:
After looking a bit, I found this issue which seems to indicate that maybe the issue is related to connection pooling. I've attempted to look to see if anyone has encountered this issue with KDS / lambda extensions, agnostic to the actual language, but some of these errors seem specific to the usage of Is there any similar issues others have experienced that could maybe lead to a resolution? I'm unsure what logic on the client-side will lead to resolving this issue |
@rcoh hey there, just updating. I've added a bit more diagnostic information over on the other issue. There may be some nuances here between the lifecycle of the extension and the aws sdk client connections used / re-used for sdk calls. Do you see potential for conflicts there? In regards to the error from the above comment:
This was caused by some experimenting with timeout_ms configuration on the extension's log buffer. I added a lot more technical info on this over on that issue I linked. |
I had something similar issue in past when pushing data to AWS-S3, my findings: Observed on wireshirk: both errors you mentions os 104 connection reset and incomplete message are same and cause by server closing (reset) connection (RST and not FIN/graceful close). Its just that hyper throws different error based on racing condition of when/how it gets to know about connection closure where it is attempting to flush data. os 104 is thrown when hyper is informed of closed connection by os while writing to closed socket, while incomplete msg is thrown when hyper knows of it while reading from closed socket. So above is normal workflow, main issue is to identify cause of why server is closing connection. In my case I solved as: a) connection pool: Default timeout for idle connection is 90s in hyper, while for S3 it is actually around 20s. So, for next request when a idle connection is picked by hyper, it might have been already closed by server leading to those errors. b) Server may close connections on violation of different policies. I found that I was exceeding the requests-count/sec rate as well as request-size/sec rate. So, as soon as rates exceeded, server started connection resets randomly on connections leading to these errors intermittently. I solved it by tuning the data-inflight as well as concurrent connection count, like max idle pool size =500. Because even I limit data-inflight , in high-bandwidth low-latency scenario for small sized requests, I still could exceed request-rate limit. For me, this reduced incomplete message error count from hundreds-of-thousands to almost zero. |
@satyamagarwal249 thanks for providing some more datapoints! Do you have an example of how this is done? I have no results for |
This ticket seems like a good analog breakdown of the problem being experienced here @rcoh I can't quite narrow in on the fix to this problem. The other ticket in the lambda extension repository has been closed because it seems like the issue may be related to the sdk library / underlying hyper configuration for the client. That ticket is here and it contains a lot of information that I'd rather not copy paste into here. This error pattern seems to occur sometimes in node js aws sdk, and from research it seems like the fix there is to set a Any potential root cause / fix you're seeing with respect to the linked tickets or the input from above? |
Describe the bug
Hey!
I've derived an example from this repository: HERE
But instead of pushing to firehose, pushing to KDS instead. See minimal example
I've added some extra logic to my version of the above code where I'm providing custom credentials to the KDS client that's instantiated, but mostly, my implementation is the same. Is there a common reason for the Connection reset by peer error? It seems like the extension doesn't spin up the logs processor unless I invoke my lambda again, but this could just be because the async processing means any logs made in the Processors call method aren't spit out until they're resolved. I've seen some calls to kinesis succeed, but others seem to fail unexpectedly with this error:
The above error is logged during a match on the result of the future that is pinned inside of a Box in the example, expanded from this value HERE
Please note that the error is intermittent, meaning that sometimes the call to KDS works, but fails randomly.
I created an issue here in the lambda extension repository, but one of the maintainers mentioned this could be an issue with the SDK. I am thinking it may be a result of the lifecycle of the extension causing connection interference with the requests to KDS.
Any guidance would be much appreciated!
Expected Behavior
Lambda extension pushes logs to KDS with no issues
Current Behavior
Lambda extension fails to push logs to KDS on an intermittent / irregular basis
Reproduction Steps
https://github.com/dgarcia-collegeboard/aws-rust-lambda-extension-kinesis-example/blob/main/src/main.rs
The above code pushes to a KDS based on set env var
Possible Solution
No response
Additional Information/Context
Relevant issue link:
awslabs/aws-lambda-rust-runtime#837
Version
Environment details (OS name and version, etc.)
AWS NodeJS runtime for lambda
Logs
DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, Custom { kind: Other, error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" } }), connection: Unknown } }
The text was updated successfully, but these errors were encountered: