-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added client-auto-sync-interval argument to the grpc-proxy #14354
Conversation
493290a
to
885cd9e
Compare
server/etcdmain/grpc_proxy.go
Outdated
@@ -130,6 +131,7 @@ func newGRPCProxyStartCommand() *cobra.Command { | |||
cmd.Flags().StringVar(&grpcProxyMetricsListenAddr, "metrics-addr", "", "listen for endpoint /metrics requests on an additional interface") | |||
cmd.Flags().BoolVar(&grpcProxyInsecureDiscovery, "insecure-discovery", false, "accept insecure SRV records") | |||
cmd.Flags().StringSliceVar(&grpcProxyEndpoints, "endpoints", []string{"127.0.0.1:2379"}, "comma separated etcd cluster endpoints") | |||
cmd.Flags().DurationVar(&grpcProxyClientAutoSyncInterval, "client-auto-sync-interval", 0, "etcd endpoints auto sync interval (disabled by default)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to endpoints_auto_sync_interval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks better. Changed.
Thanks @biosvs for the PR, two comments:
|
885cd9e
to
555da87
Compare
@ahrtr thank you for the quick review!
|
"go.etcd.io/etcd/tests/v3/framework/e2e" | ||
) | ||
|
||
func TestGrpcProxyAutoSync(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a test into gateway_test.go and reuse some existing code, and also reuse ctlV3MemberAdd or ctlV3MemberRemove
I believe it can greatly simplify the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a better approach is to add a common test, which can be executed on both integration and e2e test environment. FYI. #13637
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm not sure why grpc-proxy tests should be in the same file with the gateway tests? They are different things, and therefore it's impossible to reuse startGateway function, for example.
- I tried to use ctl* function, but they accept ctlCtx which I don't have and can't create properly. It useless without setting up epc field of type e2e.EtcdProcessCluster, which in turn I can't use because I need to add/remove member, which is impossible with e2e.EtcdProcessCluster.
- I really love your idea to use common framework to write tests once and then run them for both integration and e2e env. But for this particular case I have two concerns:
a. Neither e2e framework nor common framework have API for adding/removing members properly.
b. GRPC-proxy is a standalone application, I'm not sure if we at can talk about integration tests at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought how to write integration test for grpc-proxy, but auto sync is part of client. What I want to check is the code inside etcdmain, which cannot be tested in integration tests way (without actually ports binding, running additional processes, etc.).
So in fact my changes can be tested only in e2e tests, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why grpc-proxy tests should be in the same file with the gateway tests? They are different things, and therefore it's impossible to reuse startGateway function, for example.
Yes, you are right. Sorry I misread the filename and content. They are indeed different things.
So in fact my changes can be tested only in e2e tests, I guess.
Agreed.
I tried to use ctl* function, but they accept ctlCtx which I don't have and can't create properly. It useless without setting up epc field of type e2e.EtcdProcessCluster, which in turn I can't use because I need to add/remove member, which is impossible with e2e.EtcdProcessCluster.
There are lots of examples in the existing e2e test cases. Please try to reuse the existing utilities/methods/functions, such as testCtl, getMemberList, ctlV3MemberAdd, ctlV3MemberRemove, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used e2e.EtcdctlV3, now it looks better (without custom json parser).
As for ctlV3 - I'm afraid I can't use it. Let me explain myself:
- gprc-proxy is not part of ctl utils, it's part of main etcd.
- All ctl* test function requires ctlCtx, which could be created only with e2e.EtcdProcessCluster.
- I can't use e2e.EtcdProcessCluster (neither with ctl*, nor directly), because it spawns all etcd nodes by its own. It's possible to change member list, but impossible to actually start new node or stop another.
- I also can't use e2e.EtcdProcess, because it could be created only from e2e.EtcdServerProcessConfig, which has private field
lg
, which is set only by function that creates e2e.EtcdProcessCluster, which I can't use (previous paragraph). - At last, I found number of examples in e2e tests, where nodes are spawned explicitly. So I hope for now my test fits into other e2e tests (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's possible to change member list, but impossible to actually start new node or stop another
It seems like a real gap. Please consider to improve the existing e2e test framework if you have bandwidth and interested, of course can be in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I found another place where adding new member would be useful: https://github.com/etcd-io/etcd/blob/main/tests/e2e/ctl_v3_snapshot_test.go#L258
Sure, I'll try to extend API.
380ffb5
to
b17814d
Compare
tests/e2e/etcd_grpcproxy_test.go
Outdated
time.Sleep(autoSyncInterval + 5*time.Second) | ||
|
||
memberList, err := memberCtl.MemberList() | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard coded the 5 seconds sleep seems not good. Please consider to wait until the 2 members are ready, of course with a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no any API event regarding endpoints update. But fortunately grpc-proxy dumps endpoints into log, so I used Expect to wait for the update.
memberList, err := memberCtl.MemberList() | ||
require.NoError(t, err) | ||
|
||
node1MemberID, err := findMemberIDByEndpoint(memberList.Members, node1ClientURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check node2ClientURL
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary here, because later we're ensuring that gprc-proxy reads value from second node.
tests/e2e/etcd_grpcproxy_test.go
Outdated
require.NoError(t, proc1.Stop()) | ||
|
||
// Wait for the full stop | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all hardcoded timeouts to retries.
3e90b96
to
b6303e4
Compare
Used new Expect function with the context. And all checks finally have passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks @biosvs
I believe majority of the test code should be integrated into the existing e2e test framework, and accordingly simplify this test case and benefit other similar e2e cases. Of course, it can be addressed in separate PR.
cc @serathius @spzala @ptabor to double check.
require.NoError(t, proxyProc.Stop()) | ||
} | ||
|
||
func runEtcdNode(name, dataDir, clientURL, peerURL, clusterState, initialCluster string) (*expect.ExpectProcess, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use `tests/framework/e2e.NewEtcdServerProcess instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't, described reasons above: #14354 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, ctlV3 is terrible :(. Sorry for not noticing above comment.
tests/e2e/etcd_grpcproxy_test.go
Outdated
// ExpectFunc reads existing lines, but we're expecting for the events from the future, | ||
// that's why sometimes we have to repeat ExpectFunc calls. | ||
for { | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for checking ctx.Done
here. There is no other IO operations in this loop than ExpectFunc
which already does this for you.
tests/e2e/etcd_grpcproxy_test.go
Outdated
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
defer cancel() | ||
|
||
// ExpectFunc reads existing lines, but we're expecting for the events from the future, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I think this statement is not true. ExpectFunc
has an infinite for loop that only ends if we match the line or there is an error reported (expected when program exit). So it should be able to read from future events, if it doesn't this is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. My mind somehow kept old interpretation of what this function does. Yesterday I changed my mind, but today forget it once again, lol.
Got rid of redundant code.
Signed-off-by: Vitalii Levitskii <[email protected]>
b6303e4
to
be58a25
Compare
Is it possible to rerun check (https://github.com/etcd-io/etcd/actions/runs/2926560942)? Seems like it flaky. |
@ahrtr how to you think, is it possible to port it to 3.5.5? |
Although it's an enhancement, but it's also a bug from another perspective, because the gRPC proxy may lose connection to the backend etcd cluster if there are member changes. I am OK to backport this, but I have no strong opinion on this. What do you think? @serathius @spzala @ptabor |
So @ahrtr seems like there are no objections. How it's better to do? Just cherry-pick the commit into |
Yes, please feel free to deliver a PR for this. I think you need to manually take care of the cherry-pick instead of automatically applying the patch, because there is big difference between |
etcdmain: added client-auto-sync-interval argument to the grpc-proxy
Currently it's possible to enable auto sync in client, which connects to grpc-proxies (if --namespace is used). But grpc-proxy itself don't have an option to update endpoints list from etcd clusters.
This PR introduces an option client-auto-sync-interval for grpc-proxy, which allows to specify auto sync interval for grpc-proxy itself.