Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD GRPC Proxy does not always forward ClusterId #19295

Open
4 tasks done
CallMeFoxie opened this issue Jan 28, 2025 · 2 comments
Open
4 tasks done

ETCD GRPC Proxy does not always forward ClusterId #19295

CallMeFoxie opened this issue Jan 28, 2025 · 2 comments
Labels

Comments

@CallMeFoxie
Copy link

Bug report criteria

What happened?

It seems that in certain rare cases the ClusterId is missing from reply in GRPC Proxy.

I managed to track it down (I THINK!) to here https://github.com/etcd-io/etcd/blob/main/server/proxy/grpcproxy/watch_broadcast.go#L113 with // todo: fill in ClusterId

I don't know what is the correct solution, but some software (namely Cilium and kvstoremesh) relies on fact checking this ID with each reply and gets stuck in reconnect loop.

I can try doing a dirty patch on either etcd or cilium's side, whichever will be easier for us, but if somebody can send me the proper solution to this - I am all ears.

My only idea is to grab the ClusterId/MemberId/.. from previous responseheader like this

		Header: &pb.ResponseHeader{
			ClusterId: w.lastHeader.ClusterId,
			MemberId:  w.lastHeader.MemberId,
			Revision:  w.nextrev,
			// todo: fill in RaftTerm - don't know if that one should be copied but it is not necessarily required in this case:
		},

but that looks dirty (but should work as the header is copied on every reply - including this one)

Cheers and thanks for looking into it! It has taken me 3 work days to figure it out :-)
Ashley

What did you expect to happen?

Return ResponseHeader.ClusterId on every reply, but it doesn't with GRPC Proxy

How can we reproduce it (as minimally and precisely as possible)?

Start GRPC proxy and "spam" several instances doing the watch on the same key and revision.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
v3.5.15

$ etcdctl version
v3.5.15

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

@CallMeFoxie
Copy link
Author

CallMeFoxie commented Jan 29, 2025

No that patch wouldn't actually work as the lastHeader is not saved always when forwarded :)

I have a PoC where I "stole" the clusterID from the checkPermissionForWatch as it is called during create of the watch anyways, but that doesn't look nice for the upstream solution :-). I am at a "I need an adult" moment right now because I have not touched ETCD codebase before and don't know what's the best solution here.

If it is even a "we want to fix this" bug?

@moficodes
Copy link
Member

@serathius Could you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants