Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added client-auto-sync-interval argument to the grpc-proxy #14354

Merged
merged 1 commit into from
Aug 25, 2022

Conversation

biosvs
Copy link
Contributor

@biosvs biosvs commented Aug 16, 2022

etcdmain: added client-auto-sync-interval argument to the grpc-proxy

Currently it's possible to enable auto sync in client, which connects to grpc-proxies (if --namespace is used). But grpc-proxy itself don't have an option to update endpoints list from etcd clusters.

This PR introduces an option client-auto-sync-interval for grpc-proxy, which allows to specify auto sync interval for grpc-proxy itself.

@biosvs biosvs force-pushed the add-client-auto-sync-to-proxy branch from 493290a to 885cd9e Compare August 16, 2022 14:01
@@ -130,6 +131,7 @@ func newGRPCProxyStartCommand() *cobra.Command {
cmd.Flags().StringVar(&grpcProxyMetricsListenAddr, "metrics-addr", "", "listen for endpoint /metrics requests on an additional interface")
cmd.Flags().BoolVar(&grpcProxyInsecureDiscovery, "insecure-discovery", false, "accept insecure SRV records")
cmd.Flags().StringSliceVar(&grpcProxyEndpoints, "endpoints", []string{"127.0.0.1:2379"}, "comma separated etcd cluster endpoints")
cmd.Flags().DurationVar(&grpcProxyClientAutoSyncInterval, "client-auto-sync-interval", 0, "etcd endpoints auto sync interval (disabled by default)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to endpoints_auto_sync_interval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks better. Changed.

@ahrtr
Copy link
Member

ahrtr commented Aug 18, 2022

Thanks @biosvs for the PR, two comments:

  1. Please add a changelog item into CHANGELOG-3.6;
  2. Please try to add an integration/e2e test case.

@biosvs biosvs force-pushed the add-client-auto-sync-to-proxy branch from 885cd9e to 555da87 Compare August 18, 2022 17:00
@biosvs
Copy link
Contributor Author

biosvs commented Aug 18, 2022

@ahrtr thank you for the quick review!

  1. Appended changelog.
  2. Wrote e2e test - I used plain commands instead of wrappers, because looks like that's the only way to run independent grpc-proxy and to change cluster members list (without discovery, which I didn't want to use to avoid over complication). But I open for any suggestion, of course.

"go.etcd.io/etcd/tests/v3/framework/e2e"
)

func TestGrpcProxyAutoSync(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a test into gateway_test.go and reuse some existing code, and also reuse ctlV3MemberAdd or ctlV3MemberRemove

I believe it can greatly simplify the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a better approach is to add a common test, which can be executed on both integration and e2e test environment. FYI. #13637

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure why grpc-proxy tests should be in the same file with the gateway tests? They are different things, and therefore it's impossible to reuse startGateway function, for example.
  2. I tried to use ctl* function, but they accept ctlCtx which I don't have and can't create properly. It useless without setting up epc field of type e2e.EtcdProcessCluster, which in turn I can't use because I need to add/remove member, which is impossible with e2e.EtcdProcessCluster.
  3. I really love your idea to use common framework to write tests once and then run them for both integration and e2e env. But for this particular case I have two concerns:
    a. Neither e2e framework nor common framework have API for adding/removing members properly.
    b. GRPC-proxy is a standalone application, I'm not sure if we at can talk about integration tests at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought how to write integration test for grpc-proxy, but auto sync is part of client. What I want to check is the code inside etcdmain, which cannot be tested in integration tests way (without actually ports binding, running additional processes, etc.).

So in fact my changes can be tested only in e2e tests, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why grpc-proxy tests should be in the same file with the gateway tests? They are different things, and therefore it's impossible to reuse startGateway function, for example.

Yes, you are right. Sorry I misread the filename and content. They are indeed different things.

So in fact my changes can be tested only in e2e tests, I guess.

Agreed.

I tried to use ctl* function, but they accept ctlCtx which I don't have and can't create properly. It useless without setting up epc field of type e2e.EtcdProcessCluster, which in turn I can't use because I need to add/remove member, which is impossible with e2e.EtcdProcessCluster.

There are lots of examples in the existing e2e test cases. Please try to reuse the existing utilities/methods/functions, such as testCtl, getMemberList, ctlV3MemberAdd, ctlV3MemberRemove, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used e2e.EtcdctlV3, now it looks better (without custom json parser).

As for ctlV3 - I'm afraid I can't use it. Let me explain myself:

  • gprc-proxy is not part of ctl utils, it's part of main etcd.
  • All ctl* test function requires ctlCtx, which could be created only with e2e.EtcdProcessCluster.
  • I can't use e2e.EtcdProcessCluster (neither with ctl*, nor directly), because it spawns all etcd nodes by its own. It's possible to change member list, but impossible to actually start new node or stop another.
  • I also can't use e2e.EtcdProcess, because it could be created only from e2e.EtcdServerProcessConfig, which has private field lg, which is set only by function that creates e2e.EtcdProcessCluster, which I can't use (previous paragraph).
  • At last, I found number of examples in e2e tests, where nodes are spawned explicitly. So I hope for now my test fits into other e2e tests (:

Copy link
Member

@ahrtr ahrtr Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's possible to change member list, but impossible to actually start new node or stop another

It seems like a real gap. Please consider to improve the existing e2e test framework if you have bandwidth and interested, of course can be in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I found another place where adding new member would be useful: https://github.com/etcd-io/etcd/blob/main/tests/e2e/ctl_v3_snapshot_test.go#L258

Sure, I'll try to extend API.

@biosvs biosvs force-pushed the add-client-auto-sync-to-proxy branch 2 times, most recently from 380ffb5 to b17814d Compare August 22, 2022 08:56
Comment on lines 83 to 91
time.Sleep(autoSyncInterval + 5*time.Second)

memberList, err := memberCtl.MemberList()
require.NoError(t, err)
Copy link
Member

@ahrtr ahrtr Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard coded the 5 seconds sleep seems not good. Please consider to wait until the 2 members are ready, of course with a timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no any API event regarding endpoints update. But fortunately grpc-proxy dumps endpoints into log, so I used Expect to wait for the update.

memberList, err := memberCtl.MemberList()
require.NoError(t, err)

node1MemberID, err := findMemberIDByEndpoint(memberList.Members, node1ClientURL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check node2ClientURL as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary here, because later we're ensuring that gprc-proxy reads value from second node.

require.NoError(t, proc1.Stop())

// Wait for the full stop
time.Sleep(time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all hardcoded timeouts to retries.

@biosvs biosvs force-pushed the add-client-auto-sync-to-proxy branch 2 times, most recently from 3e90b96 to b6303e4 Compare August 25, 2022 09:38
@biosvs
Copy link
Contributor Author

biosvs commented Aug 25, 2022

Used new Expect function with the context. And all checks finally have passed.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks @biosvs

I believe majority of the test code should be integrated into the existing e2e test framework, and accordingly simplify this test case and benefit other similar e2e cases. Of course, it can be addressed in separate PR.

cc @serathius @spzala @ptabor to double check.

require.NoError(t, proxyProc.Stop())
}

func runEtcdNode(name, dataDir, clientURL, peerURL, clusterState, initialCluster string) (*expect.ExpectProcess, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use `tests/framework/e2e.NewEtcdServerProcess instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I can't, described reasons above: #14354 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, ctlV3 is terrible :(. Sorry for not noticing above comment.

// ExpectFunc reads existing lines, but we're expecting for the events from the future,
// that's why sometimes we have to repeat ExpectFunc calls.
for {
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for checking ctx.Done here. There is no other IO operations in this loop than ExpectFunc which already does this for you.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// ExpectFunc reads existing lines, but we're expecting for the events from the future,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I think this statement is not true. ExpectFunc has an infinite for loop that only ends if we match the line or there is an error reported (expected when program exit). So it should be able to read from future events, if it doesn't this is a bug.

Copy link
Contributor Author

@biosvs biosvs Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. My mind somehow kept old interpretation of what this function does. Yesterday I changed my mind, but today forget it once again, lol.

Got rid of redundant code.

@biosvs biosvs force-pushed the add-client-auto-sync-to-proxy branch from b6303e4 to be58a25 Compare August 25, 2022 12:33
@biosvs
Copy link
Contributor Author

biosvs commented Aug 25, 2022

Is it possible to rerun check (https://github.com/etcd-io/etcd/actions/runs/2926560942)? Seems like it flaky.

@biosvs biosvs requested a review from serathius August 25, 2022 19:50
@ahrtr ahrtr merged commit e408285 into etcd-io:main Aug 25, 2022
@biosvs biosvs deleted the add-client-auto-sync-to-proxy branch August 26, 2022 08:35
@biosvs
Copy link
Contributor Author

biosvs commented Aug 26, 2022

@ahrtr how to you think, is it possible to port it to 3.5.5?

@ahrtr
Copy link
Member

ahrtr commented Aug 26, 2022

@ahrtr how to you think, is it possible to port it to 3.5.5?

Although it's an enhancement, but it's also a bug from another perspective, because the gRPC proxy may lose connection to the backend etcd cluster if there are member changes.

I am OK to backport this, but I have no strong opinion on this. What do you think? @serathius @spzala @ptabor

@biosvs
Copy link
Contributor Author

biosvs commented Aug 29, 2022

So @ahrtr seems like there are no objections. How it's better to do? Just cherry-pick the commit into release-3.5 branch? Or apply patch with another commit?

@ahrtr
Copy link
Member

ahrtr commented Aug 29, 2022

So @ahrtr seems like there are no objections. How it's better to do? Just cherry-pick the commit into release-3.5 branch? Or apply patch with another commit?

Yes, please feel free to deliver a PR for this. I think you need to manually take care of the cherry-pick instead of automatically applying the patch, because there is big difference between release-3.5 and main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants