ETCD GRPC Proxy does not always forward ClusterId #19295

CallMeFoxie · 2025-01-28T19:25:18Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

It seems that in certain rare cases the ClusterId is missing from reply in GRPC Proxy.

I managed to track it down (I THINK!) to here https://github.com/etcd-io/etcd/blob/main/server/proxy/grpcproxy/watch_broadcast.go#L113 with // todo: fill in ClusterId

I don't know what is the correct solution, but some software (namely Cilium and kvstoremesh) relies on fact checking this ID with each reply and gets stuck in reconnect loop.

I can try doing a dirty patch on either etcd or cilium's side, whichever will be easier for us, but if somebody can send me the proper solution to this - I am all ears.

My only idea is to grab the ClusterId/MemberId/.. from previous responseheader like this

		Header: &pb.ResponseHeader{
			ClusterId: w.lastHeader.ClusterId,
			MemberId:  w.lastHeader.MemberId,
			Revision:  w.nextrev,
			// todo: fill in RaftTerm - don't know if that one should be copied but it is not necessarily required in this case:
		},

but that looks dirty (but should work as the header is copied on every reply - including this one)

Cheers and thanks for looking into it! It has taken me 3 work days to figure it out :-)
Ashley

What did you expect to happen?

Return ResponseHeader.ClusterId on every reply, but it doesn't with GRPC Proxy

How can we reproduce it (as minimally and precisely as possible)?

Start GRPC proxy and "spam" several instances doing the watch on the same key and revision.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
v3.5.15

$ etcdctl version
v3.5.15

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

The text was updated successfully, but these errors were encountered:

CallMeFoxie · 2025-01-29T08:39:12Z

No that patch wouldn't actually work as the lastHeader is not saved always when forwarded :)

I have a PoC where I "stole" the clusterID from the checkPermissionForWatch as it is called during create of the watch anyways, but that doesn't look nice for the upstream solution :-). I am at a "I need an adult" moment right now because I have not touched ETCD codebase before and don't know what's the best solution here.

If it is even a "we want to fix this" bug?

moficodes · 2025-01-30T19:22:26Z

@serathius Could you take a look?

CallMeFoxie added the type/bug label Jan 28, 2025

CallMeFoxie mentioned this issue Jan 28, 2025

Cannot run cilium mesh (kvstoremesh) through etcd grpc proxy cilium/cilium#37260

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD GRPC Proxy does not always forward ClusterId #19295

ETCD GRPC Proxy does not always forward ClusterId #19295

CallMeFoxie commented Jan 28, 2025

paste your configuration here

CallMeFoxie commented Jan 29, 2025 •

edited

Loading

moficodes commented Jan 30, 2025

ETCD GRPC Proxy does not always forward ClusterId #19295

ETCD GRPC Proxy does not always forward ClusterId #19295

Comments

CallMeFoxie commented Jan 28, 2025

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

CallMeFoxie commented Jan 29, 2025 • edited Loading

moficodes commented Jan 30, 2025

CallMeFoxie commented Jan 29, 2025 •

edited

Loading