-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws: async handling for aws_request_signing and aws_lambda extensions #38360
base: main
Are you sure you want to change the base?
Conversation
a6c3ddc
to
d78bc86
Compare
merge main (or just pushing a change) should fix Docker issue (#38416 ) |
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
will wait on review from AWS folks for this one |
/wait-any |
// mark credentials as pending while async completes | ||
credentials_pending_.store(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this happen on every refresh, or just when the credentials are invalid (not yet fetched or expired).
I'm thinking that you end up with periodic spikes in latency as credentials are refreshed and all requests pending have to wait for this to finish up, even if there are valid credentials available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part of the code will only get triggered when credentials are invalid as you mention (not yet fetched or expired)
as this is part of a chain, previous providers will have precedence. meaning if they have credentials we will not block requests. however if this is the first provider in the chain and it flags as pending, we will block requests waiting for it to either fail entirely, or return valid credentials.
if this provider is later in the chain, it will still flag as pending if asked, however previous providers in the chain can trump this by returning not pending and having valid credentials available.
void CredentialsProviderChain::onCredentialUpdate() { | ||
// Loop through all providers | ||
// If no providers are pending then unblock | ||
// If a provider is not pending and has credentials, then unblock | ||
for (auto& provider : providers_) { | ||
if (provider->credentialsPending()) { | ||
ENVOY_LOG(debug, "Provider {} is still pending", provider->providerName()); | ||
return; | ||
} | ||
if (provider->getCredentials().hasCredentials()) { | ||
ENVOY_LOG(debug, "Provider {} has credentials", provider->providerName()); | ||
break; | ||
} else { | ||
ENVOY_LOG(debug, "Provider {} has blank credentials, continuing through chain", | ||
provider->providerName()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can lead to some weirdness where multiple credential providers have available credentials, but the timing is such that one fetch sees a lower priority provider before the higher priority provider kicks in.
Perhaps it should wait for all providers to no longer be pending (either has credentials or be in an error state) before reporting completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic should be as follows
for each provider
- if pending (ie we're waiting for initial refresh or expiry refresh) then block
- if we're not pending and we have no credentials (ie refresh failed entirely) then move to next provider
in this scenario the higher priority provider will always block us from further processing until it fails or has credentials available. we should not have a scenario in which a lower priority provider impacts processing unless a higher priority provider has failed.
see 'see here' comment for more detail
Signed-off-by: Nigel Brittain <[email protected]>
ENVOY_LOG_MISC(debug, "We have {} pending callbacks", credential_pending_callbacks_.size()); | ||
} | ||
return pending; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"see here"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is called by our signer, to add a callback if processing could not be completed.
the logic here is now as follows:
- for each provider
-- if credentials are pending, then add a callback for later processing (block)
-- if credentials are not pending, and we have credentials available (fetching has not failed) then do not block
-- otherwise, go to the next provider
if we fall through all providers, we will return not pending, which effectively causes signing to fail. this is expected behaviour if we are entirely unable to retrieve credentials.
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Signed-off-by: Nigel Brittain <[email protected]>
Commit Message: aws: minor refactor of CredentialsProviderChain and its usage Additional Description: Supports work on #38360 . CredentialProviderChain was being used interchangeably with CredentialProvider, which was likely its original intended usage but makes distinguishing them from each other impossible and forcing needless messy overloads. Scenarios where provider without a chain were implemented have been replaced with chain instantiations. Risk Level: Negligible Testing: Unit Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):] --------- Signed-off-by: Nigel Brittain <[email protected]>
Commit Message: aws: async handling for aws_request_signing and aws_lambda extensions
Additional Description:
Part 3 of 3 patches, to address customer reported bugs in the aws request signing extension when using the route discovery service.
This change ensures that requests to aws signing are correctly paused if credentials are still pending. A minor behaviour to the extensions will occur - instead of returning blank (anonymous) credentials if credential retrieval is still in flight, the extensions will pause until async fetch completion.
Part 2 - #38271
Risk Level: Low
Testing: Unit
Docs Changes: N/A
Release Notes: Update changelog
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]