Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize calls to DiscoverPollEndpoint #4504

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

isaac-400
Copy link

@isaac-400 isaac-400 commented Feb 15, 2025

Summary

Agents that share ECSClients can call DiscoverPollEndpoint (DPE) multiple times per task. Each routine that calls DPE will first check the cache before performing the actual API call over the network. The intention here is that only one actual API call is performed (by the first routine to call DPE).

However, it is possible for multiple routines to race and effectively make many actual API calls. This is because the pollEndpointCache is only updated when the first API call returns.

This change enforces the intended behavior by making subsequent goroutines wait for the cache to be updated (or not) by the first goroutine, eliminating simultaneous calls to DPE.

Implementation details

The implementation adds a lock to the ECSClient implementation of DiscoverPollEndpoint, thus concurrent calls to DPE will execute one at a time.

Since other DiscoverPollEndpointCalls will block, this change also adds a conservative timeout for the method (AWS Blog Reference)

Testing

make test

New tests cover the changes: yes

Description for the changelog

* Bug - Fixed a race condition with concurrent DiscoverPollEndpoint calls

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

No
Does this PR include the addition of new environment variables in the README?

No

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@isaac-400 isaac-400 marked this pull request as ready for review February 17, 2025 20:57
@isaac-400 isaac-400 requested a review from a team as a code owner February 17, 2025 20:57
tshan2001
tshan2001 previously approved these changes Feb 17, 2025
@isaac-400 isaac-400 force-pushed the sync-dpe branch 3 times, most recently from 556f971 to 870c60e Compare February 19, 2025 17:39
@isaac-400 isaac-400 force-pushed the sync-dpe branch 2 times, most recently from d5aa82e to 105da09 Compare February 19, 2025 21:35
Isaac Feldman added 2 commits February 20, 2025 09:36
Agents that share ECSClients can call DiscoverPollEndpoint (DPE) multiple
times per task. Each routine that calls DPE will first check the cache
before performing the actual API call over the network. The intention
here is that only one actual API call is performed (by the first
routine to call DPE).

However, it is possible for multiple routines to race and effectively
make many actual API calls. This is because the `pollEndpointCache` is
only updated when the first API call _returns_.

This change enforces the intended behavior by making subsequent
routines wait for the cache to be updated (or not) by the first
thread, eliminating simultaneous calls to DPE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants