-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failover to replicas on 401&403 HTTP status codes from the SDK #568
Comments
Can you give one example of what else would be useful? |
@zhenlan The idea was to protect against a potential unexpected 401/403 error causing the provider to throw an exception. If there is a bug that causes failures for only a specific endpoint, then it would be unfortunate if the provider can't attempt to fail over to a replica. I'm not sure exactly how this 401/403 bug would manifest itself, but that's how it was described to me. |
All replicas share the same auth model. Switching from one replica to another won't help the 401/403 errors. |
That's the happy path. Replicas are meant to kick in during disaster scenarios. In such cases it's not out of the question to consider a scenario where an issue in the service causes a given region/replica to return 401/403. Yes, it would be some kind of bug or regional issue that would cause it, but in that case we should fail over to replicas. I think we missed on this one. We should never have a case that the server can do something that causes the client to not attempt a replica. Otherwise, that specific scenario happening due to a disaster could shut down a client that has replication enabled. |
Do we have a real scenario or is this more like "in case we have a bug" scenario? |
Bug/disaster. yes. |
@zhenlan do you still have concerns on this one? |
I'm still not convinced we should failover for potential bugs. Since bugs can be anything, that's like failover for everything (all 4xx status code... assuming we don't have bugs in the status code itself). |
Discussed offline, the plan is to failover to any available replicas on 401 and 403 status codes due to potential regional failures of other services App Configuration depends on. |
Particularly, the regional failures of the Microsoft Entra ID and RBAC services that App Configuration depends on. This is why we only add failovers for 401 and 403 (instead of any other arbitrary client errors) at this time. |
Currently, the provider only fails over to available replicas on 3 specific HTTP status codes when a request returns a
RequestFailedException
. However, we can fail over on all status codes to have better resiliency and prevent throwing an exception when there is an unexpected status code that could potentially be resolved by requesting a replica instead.The text was updated successfully, but these errors were encountered: