Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover support #20

Merged
merged 18 commits into from
Apr 1, 2024
Merged

Conversation

linglingye001
Copy link
Contributor

No description provided.

Configuration AzureAppConfigurationKeyValueOptions `json:"configuration,omitempty"`
Secret *AzureKeyVaultReference `json:"secret,omitempty"`
FeatureFlag *AzureAppConfigurationFeatureFlagOptions `json:"featureFlag,omitempty"`
Endpoint *string `json:"endpoint,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

if err != nil {
return nil, err
}
manager.secret, err = parseConnectionString(manager.ConnectionString, SecretSection)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we keep both secret/id and connectionString? looks like one of them is enough.

return manager, nil
}

func SrvTargetHostQuery(host string) ([]string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the verb in the beginning sounds more natural as a method name.

return nil, err
}

originHost := strings.TrimSuffix(originRecords[0].Target, ".")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to make sure, the originRecords will never be an empty array. right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each replica has _origin._tcp.* SRV record been created to indicate the origin store of the replica. If there's no replica of user's appconfig store, the originRecoreds will be empty.

}
}

altHost := strings.TrimSuffix(altRecords[0].Target, ".")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only altRecords[0]?

currentTime := metav1.Now()
if currentTime.After(manager.lastFallbackClientsAttempt.Time.Add(MinimalClientRefreshInterval)) &&
(manager.DynamicClients == nil ||
currentTime.After(manager.lastFallbackRefresh.Time.Add(FallbackClientRefreshExpireInterval))) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is lastFallbackClientsAttempt time and lastFallbackRefresh time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastFallbackRefresh -> lastFallbackClientRefresh
lastFallbackClientsRefreshAttempt -> lastFallbackClientRefreshAttempt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastFallbackClientsAttempt time is the moment provider try to query srv hosts. It may fail during query process, if the query succeed, lastFallbackRefresh time will be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use them to ensure two things:

  1. Not perform the lookup too frequently, if we see the last attempt is within a certain time period (30s), we will not perform a new lookup
  2. To ensure discovered clients could be refreshed periodically (every 1 hour), if the last success time exceeds 1 hour, we perform a new lookup

}

func (manager *ConfigurationClientManager) GetClients() []*ConfigurationClientWrapper {
if manager.ReplicaDiscoveryEnabled != true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!manager.ReplicaDiscoveryEnabled

@linglingye001 linglingye001 force-pushed the user/linglingye/replicaDiscovery branch from dd527b4 to 93775e6 Compare March 13, 2024 04:32
Endpoint *string `json:"endpoint,omitempty"`
ConnectionStringReference *string `json:"connectionStringReference,omitempty"`
// +kubebuilder:default=true
ReplicaDiscovery bool `json:"replicaDiscovery,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it right after the Endpoint field

@@ -198,18 +199,28 @@ func (reconciler *AzureAppConfigurationProviderReconciler) Reconcile(ctx context
}
}

if reconciler.ProvidersReconcileState[req.NamespacedName].ClientManager == nil ||
reconciler.ProvidersReconcileState[req.NamespacedName].Generation != provider.Generation {
manager, err := loader.NewConfigurationClientManager(ctx, *provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manager -> clientManager

/* Create ConfigurationSettingLoader to get the key-value settings from Azure AppConfiguration. */
configProvider, err := loader.NewConfigurationSettingLoader(ctx, *provider, nil)
manager := reconciler.ProvidersReconcileState[req.NamespacedName].ClientManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this line before line 202

In line 202, just judge whether manager == nil

if reconciler.Retriever == nil {
retriever = configProvider
} else {
if reconciler.Retriever != nil {
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change

@@ -180,6 +181,7 @@ var _ = Describe("AppConfiguationProvider controller", func() {
},
}

// mockConfigClientManager.EXPECT().ExecuteFailoverPolicy(gomock.Any(), gomock.Any(), gomock.Any()).Return(nil, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless

mockResolveSecretReference *MockResolveSecretReference
mockCtrl *gomock.Controller
mockCongiurationClientManager *MockClientManager
EndpointName string = "https://fake-endpoint"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All variables are using camel-style, please update it to be consistent

@@ -39,7 +35,7 @@ import (
type ConfigurationSettingLoader struct {
acpv1.AzureAppConfigurationProvider
getSettingsFunc GetSettingsFunc
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems getSettingsFunc need to be Pascal style, since all struct fields are Pascal Style


type ConfigurationClientManager struct {
ReplicaDiscoveryEnabled bool
SpecifiedClient *azappconfig.Client
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about call it StaticClientWrappers and make it to be an array []*ConfigurationClientWrapper ? To potentially support customer can pass more than one static clients.

DynamicClients []*ConfigurationClientWrapper
ValidDomain string
Endpoint string
ClientsFallbackStatus map[string]*ConfigurationClientWrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this? Could use DynamicClients and StaticClients directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we don't need to update dynamicClients fallback status when autofailover enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying we don't need to update fallback of dynamicClients, I mean why we don't update it in DynamicClientwrappers directly, not need to use another object which is only for managing the fallbacks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

type ConfigurationClientManager struct {
ReplicaDiscoveryEnabled bool
SpecifiedClient *azappconfig.Client
DynamicClients []*ConfigurationClientWrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamicClientWrappers

GetClients() []*ConfigurationClientWrapper
RefreshClients()
UpdateClientBackoffStatus(endpoint string, successful bool)
ExecuteFailoverPolicy(ctx context.Context, filters []acpv1.Selector, getSettingsFunc GetSettingsFunc) ([]azappconfig.Setting, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect the clientManager has the method ExecuteFailoverPolicy(), the clientManger just take the responsibility for managing the clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will move it to configLoader.

@linglingye001 linglingye001 force-pushed the user/linglingye/replicaDiscovery branch from 5b94884 to da91ced Compare March 15, 2024 05:42

func parseConnectionString(connectionString string, token string) (string, error) {
if connectionString == "" {
return "", NewArgumentError("connectionString", fmt.Errorf("connectionString cannot be empty"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not return as ArugmentError in this method, should handle all the errors as runtime error.


func Jitter(duration time.Duration) time.Duration {
if JitterRatio < 0 || JitterRatio > 1 {
panic("jitterRatio must be between 0 and 1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to use panic here?

currentTime.After(manager.lastFallbackClientAttempt.Time.Add(MinimalClientRefreshInterval)) {
manager.lastFallbackClientAttempt = currentTime
url, _ := url.Parse(manager.endpoint)
_ = manager.DiscoverFallbackClients(url.Host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not ignore the error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiscoverFallbackClients may return error in line 188, we should not ignore this error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging warning is fine in line 174

if _, ok := err.(*net.DNSError); ok {
break
} else {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not return nil here, must be an empty array

_, altRecords, err := net.LookupSRV(currentAlt, TCP, originHost)
if err != nil {
// If the host does not have SRV records => no more replicas
if _, ok := err.(*net.DNSError); ok {
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation the error is not *net.DNSError ? We should differentiate the not existing error from other errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNSError has IsNotFound field, we should use that, when IsNotFound is true, we should not treat it as error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may return net error or something else. When there's no more alt records it will return dns error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Copy link
Contributor

@RichardChen820 RichardChen820 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error is net.DNSError and IsNotFound is true, just break the loop. Throw for any other errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's no alt recoreds, IsNotFound filed in net.DNSError is false.

results := make([]string, 0)

_, originRecords, err := net.LookupSRV(Origin, TCP, host)
if err != nil {
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error handling should same as line 231

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean check the error's type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , judge type net.DNSError and IsNotFound

return results, err
}

originHost := strings.TrimSuffix(originRecords[0].Target, ".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The originHost should also be added to the return results.


newDynamicClients := make([]*ConfigurationClientWrapper, 0)
for _, host := range srvTargetHosts {
if isValidEndpoint(host, manager.validDomain) {
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should judge whether the host is already in the staticClients, should not add it to dynamicClients if it is already a static client


// Failed to execute failover policy
csl.ClientManager.RefreshClients()
return nil, fmt.Errorf("failed to execute failover policy: all clients failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all app configuration clients failed to get settings

Endpoint *string `json:"endpoint,omitempty"`
Endpoint *string `json:"endpoint,omitempty"`
// +kubebuilder:default=true
ReplicaDiscovery bool `json:"replicaDiscovery,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ReplicaDiscoveryEnabled to keep consistent with other provdivers

}
}

originHost := strings.TrimSuffix(originRecords[0].Target, ".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the originRecords has at least one item in it. we should judge its length, if length is 0, return empty result

func QuerySrvTargetHost(host string) ([]string, error) {
results := make([]string, 0)

_, originRecords, err := net.LookupSRV(Origin, TCP, host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use this method LookupSRV(ctx context.Context, service, proto, name string) (string, []*SRV, error) passing the context, which means all the calling chain should add context in the parameter

Copy link
Contributor

@RichardChen820 RichardChen820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset to request changes

@@ -30,7 +30,9 @@ import (
type AzureAppConfigurationProviderSpec struct {
// The endpoint url of AppConfiguration which should sync the configuration key-values from.
// +kubebuilder:validation:Format=uri
Endpoint *string `json:"endpoint,omitempty"`
Endpoint *string `json:"endpoint,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's kubebuilder notation between lines which influences go fmt result and the indention here.

@@ -37,6 +37,9 @@ spec:
description: AzureAppConfigurationProviderSpec defines the desired state
of AzureAppConfigurationProvider
properties:
ReplicaDiscoveryEnabled:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this customer-facting? should be camel case

Endpoint *string `json:"endpoint,omitempty"`
Endpoint *string `json:"endpoint,omitempty"`
// +kubebuilder:default=true
ReplicaDiscoveryEnabled bool `json:"ReplicaDiscoveryEnabled,omitempty"`
Copy link
Contributor

@RichardChen820 RichardChen820 Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json:"replicaDiscoveryEnabled,omitempty", it should be camel casing

//go:generate mockgen -destination=mocks/mock_configuration_client_manager.go -package mocks . ClientManager

type ConfigurationClientManager struct {
ReplicaDiscoveryEnabledEnabled bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why two Enabled in name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

ReplicaDiscoveryEnabledEnabled bool
StaticClientWrappers []*ConfigurationClientWrapper
DynamicClientWrappers []*ConfigurationClientWrapper
validDomain string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the guidance of naming? inconsistent here: some fields are camel case, others are pascal case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type starting with an uppercase letter are exported to other packages. Those starting with a lowercase letter can be used only inside the package.

srvTargetHosts, err := QuerySrvTargetHost(ctx, host)
if err != nil {
klog.Warningf("Fail to build fall back clients %s", err.Error())
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return or throw in case of error? It continues to execute below codes when something wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return, updated.

return connectionString[startIndex+len(parseToken) : endIndex], nil
}

func verfityEndpointFromConnectionString(endpoint string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verfity -> verify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems not being used.

@linglingye001 linglingye001 merged commit e9f2050 into release/v1 Apr 1, 2024
2 checks passed
@linglingye001 linglingye001 deleted the user/linglingye/replicaDiscovery branch April 1, 2024 03:18
RichardChen820 pushed a commit that referenced this pull request Apr 17, 2024
* failover support

* modify client manager

* resolve conflicts

* add net error to failoverable error

* resolve comments

* resove comments

* fix nil pointer

* resolve comments

* add failover test

* update getclients func

* resolve comments

* resolve comments

* upgrage go version

* resolve comments

* resolve comments

* fix typo

* use context with timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants