Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run Fargate tasks. #223

Open
guyShindel opened this issue Feb 3, 2025 · 7 comments
Open

Failed to run Fargate tasks. #223

guyShindel opened this issue Feb 3, 2025 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@guyShindel
Copy link

When running distributed load tests using the solution, executions fail with a “Failed to run Fargate tasks. ” error. The CloudWatch logs show an error during a Lambda “invoke” call with a message:
"errorType":"LimitExceededException","errorMessage":"Resource limit exceeded."
This suggests that one of the service quotas—likely Lambda’s concurrent executions or invoke API rate limit—is being exceeded.

Screenshots

Image

Expected behavior
The load test should either complete successfully or fail with a clear indication of which service quota is exceeded. Ideally, the solution would provide more detailed diagnostics or mitigation advice when such limits are approached.

Despite reviewing CloudWatch metrics, I am unsure which quota is the problem for me and would appreciate any help specifying which quotas it could be.

@guyShindel guyShindel added the bug Something isn't working label Feb 3, 2025
@kamyarz-aws kamyarz-aws self-assigned this Feb 3, 2025
@kamyarz-aws
Copy link
Member

Can you provide more detail on how I can recreate the issue on my end. Maybe script that you can share.

What version of DLT are you using ?

"errorType":"LimitExceededException","errorMessage":"Resource limit exceeded." where do you see this in CW logs for ECS tasks?

Can you check the step function associated with the solution to see what lambda invocation is actually failing?

@guyShindel
Copy link
Author

Hi,
Thank you for the fast response.

I’m currently using DLT version 3.2.5
the error only appears in the Lambda invocation logs within the Step Functions execution.

here is the full cause field:

"cause": "{\"errorType\":\"LimitExceededException\",\"errorMessage\":\"Resource limit exceeded.\",\"trace\":[\"LimitExceededException: Resource limit exceeded.\",\"    at Request.extractError (/var/task/node_modules/aws-sdk/lib/protocol/json.js:80:27)\",\"    at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\",\"    at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\",\"    at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:686:14)\",\"    at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)\",\"    at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)\",\"    at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10\",\"    at Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)\",\"    at Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:688:12)\",\"    at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:116:18)\"]}",
        "error": "LimitExceededException"
    },

@kamyarz-aws
Copy link
Member

that is a relatively old version, have you started seeing this error recently ? what is the concurrent lambda invocation quota in your account? Which lambda is getting this error

Your test is not something that solution would fail for, I have seen this kind of tests daily running from other clients with no problem.

I might need more info most probably about your account to give you more guidance. Right now this doesnt strike me as a solution bug, and it is more of an account limitation.

@kamyarz-aws
Copy link
Member

Can you go to your cloudtrail, and set the event history to the past 30 minutes, then run a dlt test. Then if you monitor your cloudtrail api actions, we can see what resource is hitting the limit.

I know it was confusing but that is the error that AWS generates, it was really out of DLT hands.

@guyShindel
Copy link
Author

Got it
I’ll investigate and update here once I have more info.

@kamyarz-aws
Copy link
Member

It is probably metric filters that are hitting the limit. If you delete older tests, metric filters will be removed as well and your issue would be resolved. We update our documentation to mention this. We discuss it internally and see if we can resolve this or increase the limit and address it in one of the future releases.

@guyShindel
Copy link
Author

It looks like it’s working now after deleting all the previous tests—thanks!

Just to confirm, the current versions still don’t address this issue, right?
Besides deleting older tests, how can I ensure I don’t hit this limit again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants