Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws: async handling for aws_request_signing and aws_lambda extensions #38360

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

nbaws
Copy link
Contributor

@nbaws nbaws commented Feb 9, 2025

Commit Message: aws: async handling for aws_request_signing and aws_lambda extensions

Additional Description:

Part 3 of 3 patches, to address customer reported bugs in the aws request signing extension when using the route discovery service.

This change ensures that requests to aws signing are correctly paused if credentials are still pending. A minor behaviour to the extensions will occur - instead of returning blank (anonymous) credentials if credential retrieval is still in flight, the extensions will pause until async fetch completion.

Part 2 - #38271

Risk Level: Low
Testing: Unit
Docs Changes: N/A
Release Notes: Update changelog
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@nbaws nbaws requested a review from mattklein123 as a code owner February 9, 2025 09:50
@nbaws nbaws marked this pull request as draft February 9, 2025 09:50
@nbaws nbaws force-pushed the rds_refresh_3 branch 2 times, most recently from a6c3ddc to d78bc86 Compare February 10, 2025 22:27
@phlax
Copy link
Member

phlax commented Feb 12, 2025

merge main (or just pushing a change) should fix Docker issue (#38416 )

@nbaws nbaws marked this pull request as ready for review February 12, 2025 10:56
nbaws added 24 commits February 12, 2025 11:05
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
nbaws added 11 commits February 12, 2025 11:05
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
@nbaws
Copy link
Contributor Author

nbaws commented Feb 12, 2025

will wait on review from AWS folks for this one

@phlax
Copy link
Member

phlax commented Feb 12, 2025

/wait-any

Comment on lines +415 to +416
// mark credentials as pending while async completes
credentials_pending_.store(true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this happen on every refresh, or just when the credentials are invalid (not yet fetched or expired).

I'm thinking that you end up with periodic spikes in latency as credentials are refreshed and all requests pending have to wait for this to finish up, even if there are valid credentials available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of the code will only get triggered when credentials are invalid as you mention (not yet fetched or expired)

as this is part of a chain, previous providers will have precedence. meaning if they have credentials we will not block requests. however if this is the first provider in the chain and it flags as pending, we will block requests waiting for it to either fail entirely, or return valid credentials.

if this provider is later in the chain, it will still flag as pending if asked, however previous providers in the chain can trump this by returning not pending and having valid credentials available.

Comment on lines 940 to 956
void CredentialsProviderChain::onCredentialUpdate() {
// Loop through all providers
// If no providers are pending then unblock
// If a provider is not pending and has credentials, then unblock
for (auto& provider : providers_) {
if (provider->credentialsPending()) {
ENVOY_LOG(debug, "Provider {} is still pending", provider->providerName());
return;
}
if (provider->getCredentials().hasCredentials()) {
ENVOY_LOG(debug, "Provider {} has credentials", provider->providerName());
break;
} else {
ENVOY_LOG(debug, "Provider {} has blank credentials, continuing through chain",
provider->providerName());
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can lead to some weirdness where multiple credential providers have available credentials, but the timing is such that one fetch sees a lower priority provider before the higher priority provider kicks in.

Perhaps it should wait for all providers to no longer be pending (either has credentials or be in an error state) before reporting completion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic should be as follows
for each provider

  • if pending (ie we're waiting for initial refresh or expiry refresh) then block
  • if we're not pending and we have no credentials (ie refresh failed entirely) then move to next provider

in this scenario the higher priority provider will always block us from further processing until it fails or has credentials available. we should not have a scenario in which a lower priority provider impacts processing unless a higher priority provider has failed.

see 'see here' comment for more detail

ENVOY_LOG_MISC(debug, "We have {} pending callbacks", credential_pending_callbacks_.size());
}
return pending;
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"see here"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is called by our signer, to add a callback if processing could not be completed.
the logic here is now as follows:

  • for each provider
    -- if credentials are pending, then add a callback for later processing (block)
    -- if credentials are not pending, and we have credentials available (fetching has not failed) then do not block
    -- otherwise, go to the next provider

if we fall through all providers, we will return not pending, which effectively causes signing to fail. this is expected behaviour if we are entirely unable to retrieve credentials.

Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
@nbaws nbaws marked this pull request as draft February 14, 2025 05:29
Signed-off-by: Nigel Brittain <[email protected]>
mattklein123 pushed a commit that referenced this pull request Feb 14, 2025
Commit Message: aws: minor refactor of CredentialsProviderChain and its
usage
Additional Description: 
Supports work on #38360 .
CredentialProviderChain was being used interchangeably with
CredentialProvider, which was likely its original intended usage but
makes distinguishing them from each other impossible and forcing
needless messy overloads.
Scenarios where provider without a chain were implemented have been
replaced with chain instantiations.

Risk Level: Negligible
Testing: Unit
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

---------

Signed-off-by: Nigel Brittain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants