ETCD continuously printing "transport is closing" in the logs , and etcdctl commands failing with context deadline exceeded (request timeout error) #17438

rahulbapumore · 2024-02-16T17:13:08Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via [email protected].
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

We have ETCD 3.5.7 deployed inside container of pod managed by statefulset controller, and with only 1 replica.
Deployment is kept for some days and key/values are inserted constantly , it works fine for dew days, but when statefulset is scale down to zero and then scaled up to 1, and when pod is coming up, entered some of the etcdctl commands before etcd completely starts up. This is the trigger point and all etcdctl commands stops working and etcd goes in very bad state that couldnt be recovered unless db file is deleted.
Firstly ETCD gives latest balancer error as below,

{"attempt":0,"caller":"[email protected]/retry_interceptor.go:62","error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.107.143.247:2379: connect: connection refused\"","logger":"etcd-client","message":"retrying of unary invoker failed","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","target":"etcd-endpoints://0xc00002c000/dced.namespace1:2379","timestamp":"2024-02-16T11:19:41.244+00:00","version":"1.2.0"}

So etcd will be up and running, but it will give context deadline exceeded saying request timeout error and inside logs it will keep printing below log line
{"message":"WARNING: 2024/02/16 16:04:22 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:04:22.567+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:11:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:11:46.177+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:24:42 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:24:42.662+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:34:52 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:34:52.710+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:41:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:41:46.176+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:46:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:46:46.178+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:51:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:51:46.177+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:55:12 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:55:12.803+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 16:56:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T16:56:46.178+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 17:00:17 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T17:00:17.825+00:00","version":"1.2.0"} {"message":"WARNING: 2024/02/16 17:01:46 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transpor is closing\"","metadata":{"container_name":"dced","namespace":"namespace1","pod_name":"dced-0"},"service_id":"dced","severity":"warning","timestamp":"2024-02-16T17:01:46.177+00:00","version":"1.2.0"}
logs.txt

What did you expect to happen?

ETCD shouldnt have gone into such bad state which required db deletion losing all data

How can we reproduce it (as minimally and precisely as possible)?

We have ETCD 3.5.7 deployed inside container of pod managed by statefulset controller, and with only 1 replica.
Deployment is kept for some days and key/values are inserted constantly , then suddenly taking down pod by scaling statefulset down and then scaling statefulset up, and before etcd get configured, we are running some of the etcdctl commands which is the trigger point according to us.

Anything else we need to know?

No response

Etcd version (please run commands below)

bash-4.4$ etcd --version
etcd Version: 3.5.7
Git SHA: 215b53c
Go Version: go1.17.13
Go OS/Arch: linux/amd64
bash-4.4$ etcdctl version
etcdctl version: 3.5.7
API version: 3.5
bash-4.4$

Etcd configuration (command line flags or environment variables)

bash-4.4$ clear
bash-4.4$ env
BOOTSTRAP_ENABLED=false
SIP_SERVICE_PORT_HTTP_METRIC_TLS=8889
VALID_PARAMETERS=valid
ETCD_INITIAL_CLUSTER_TOKEN=dced
TLS_ENABLED=true
ETCD_MAX_SNAPSHOTS=3
CLIENT_PORTS=2379
SIP_SERVICE_PORT=8889
TZ=UTC
HOSTNAME=dced-0
SIP_PORT_8889_TCP_PORT=8889
COMPONENT_VERSION=v3.5.7
HTTP_PROBE_CMD_DIR=/usr/local/bin/health
HTTP_PROBE_READINESS_CMD_TIMEOUT_SEC=15
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_HEARTBEAT_INTERVAL=100
ETCD_AUTO_COMPACTION_RETENTION=100
DISARM_ALARM_PEER_INTERVAL=6
NAMESPACE=namespace1
ETCD_TRUSTED_CA_FILE=/data/combinedca/cacertbundle.pem
DB_THRESHOLD_PERCENTAGE=70
MONITOR_ALARM_INTERVAL=5
PEER_CERT_AUTH_ENABLED=true
KMS_SERVICE_HOST=10.107.175.120
SIP_PORT_8889_TCP_PROTO=tcp
KMS_PORT_8200_TCP_PROTO=tcp
TRUSTED_CA=/data/combinedca/cacertbundle.pem
PEER_CLIENTS_CERTS=/run/sec/certs/peer/srvcert.pem
FIFO_DIR=/fifo
KUBERNETES_PORT_443_TCP_PROTO=tcp
ENTRYPOINT_RESTART_ETCD=true
HTTP_PROBE_NAMESPACE=namespace1
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
ETCDCTL_CERT=/run/sec/certs/client/clicert.pem
ENTRYPOINT_DCED_PROCESS_INTERVAL=5
DEFRAGMENT_ENABLE=true
DCED_SERVICE_HOST=10.107.143.247
ETCD_LOG_LEVEL=info
ENTRYPOINT_CHECKSNUMBER=60
SIP_PORT=tcp://10.111.183.137:8889
KUBERNETES_PORT=tcp://10.96.0.1:443
POD_NAME=dced-0
DCED_SERVICE_PORT=2379
SIP_SERVICE_HOST=10.111.183.137
PWD=/
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
HOME=/home/dced
DCED_SERVICE_PORT_CLIENT_PORT_TLS=2379
ETCD_AUTO_COMPACTION_MODE=revision
KUBERNETES_SERVICE_PORT_HTTPS=443
DCED_PORT_2379_TCP_ADDR=10.107.143.247
KUBERNETES_PORT_443_TCP_PORT=443
ETCD_LOGGER=zap
PEER_AUTO_TLS_ENABLED=true
KMS_SERVICE_PORT_HTTPS_KMS=8200
ETCD_CERT_FILE=/run/sec/certs/server/srvcert.pem
ETCD_PEER_AUTO_TLS=true
DCED_PORT_2379_TCP_PORT=2379
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
DCED_PORT_2379_TCP=tcp://10.107.143.247:2379
LISTEN_PEER_URLS=https://0.0.0.0:2380
DEFRAGMENT_PERIODIC_INTERVAL=60
CONTAINER_NAME=dced
COMPONENT=etcd
ETCD_DATA_DIR=/data
ETCD_CLIENT_CERT_AUTH=true
TERM=xterm
KMS_PORT=tcp://10.107.175.120:8200
ETCDCTL_ENDPOINTS=dced.namespace1:2379
HTTP_PROBE_LIVENESS_CMD_TIMEOUT_SEC=15
ETCD_METRICS=basic
PEER_CLIENT_KEY_FILE=/run/sec/certs/peer/srvprivkey.pem
HTTP_PROBE_CONTAINER_NAME=dced
SIP_PORT_8889_TCP_ADDR=10.111.183.137
GODEBUG=tls13=1
ETCDCTL_API=3
DCED_PORT=tcp://10.107.143.247:2379
ETCD_SNAPSHOT_COUNT=5000
ETCD_MAX_WALS=3
SHLVL=1
KMS_PORT_8200_TCP_ADDR=10.107.175.120
HTTP_PROBE_POD_NAME=dced-0
KUBERNETES_SERVICE_PORT=443
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://dced-0.dced-peer.namespace1.svc.cluster.local:2380
HTTP_PROBE_STARTUP_CMD_TIMEOUT_SEC=15
ETCD_KEY_FILE=/run/sec/certs/server/srvprivkey.pem
ETCD_ELECTION_TIMEOUT=1000
HTTP_PROBE_SERVICE_NAME=dced
ETCDCTL_CACERT=/data/combinedca/cacertbundle.pem
ETCD_NAME=dced-0
ETCD_QUOTA_BACKEND_BYTES=268435456
SIP_PORT_8889_TCP=tcp://10.111.183.137:8889
ENTRYPOINT_PIPE_TIMEOUT=5
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ETCD_ADVERTISE_CLIENT_URLS=https://dced-0.dced.namespace1:2379
DCED_PORT=2379
KMS_SERVICE_PORT=8200
KUBERNETES_SERVICE_HOST=10.96.0.1
FLAVOUR=etcd-v3.5.7-linux-amd64
KMS_PORT_8200_TCP=tcp://10.107.175.120:8200
DCED_PORT_2379_TCP_PROTO=tcp
KMS_PORT_8200_TCP_PORT=8200
ETCDCTL_KEY=/run/sec/certs/client/cliprivkey.pem
_=/usr/bin/env
bash-4.4$

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

bash-4.4$ etcdctl member list -w table
+------------------+---------+----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+------------+
| 7928e6047223afac | started | eric-data-distributed-coordinator-ed-0 | https://eric-data-distributed-coordinator-ed-0.eric-data-distributed-coordinator-ed-peer.zmorrah1.svc.cluster.local:2380 | https://eric-data-distributed-coordinator-ed-0.eric-data-distributed-coordinator-ed.zmorrah1:2379 | false |
+------------------+---------+----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+------------+
bash-4.4$ etcdctl endpoint status -w table
+----------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| eric-data-distributed-coordinator-ed.zmorrah1:2379 | 7928e6047223afac | 3.5.7 | 2.6 MB | true | false | 3 | 231 | 231 | |
+----------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
bash-4.4$

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

rahulbapumore · 2024-02-16T17:14:05Z

@ahrtr
Really need help here in understanding the issue/root cause/recovery. Q&A is not helping

jmhbnz · 2024-02-16T23:19:24Z

We have ETCD 3.5.7 deployed inside container of pod managed by statefulset controller, and with only 1 replica

Please provide the statefulset yaml.

and when pod is coming up, entered some of the etcdctl commands before etcd completely starts up.

Please provide the exact sequence of commands you run in order to create the issue once the statefulset has been applied to a cluster.

@rahulbapumore please do not open multiple issues, this is already an active support discussion under #17394.

Please remember, community support is provided by volunteers on a best efforts basis and is not guaranteed. If you need more hands on or timely support for etcd you may be best to engage with a software vendor.

If you would like this bug to be accepted you need to give us a way to recreate the issue you are seeing, otherwise it will be closed as a duplicate of the support discussion.

rahulbapumore · 2024-02-20T04:18:53Z

Hi @ahrtr @jmhbnz
Below are the exact steps we performed -

Install microservice as statefulset where each pod has dced container where etcd 3.5.12 is running
Installed with only one replica at the time of deployment
then checked health, endpoint status , member list and put command, works fine
Then scaled up statefulset to replicas=3, other pods will join pod-0 and then ran some of the commands to check deployment is healthy or not, ran etcdctl member list, endpoint status, endpoint health, put , Everything works fine as expected.
Then from inside pod-0, removed the member pof pod-1 and pod-2 in order to make it only 1 member cluster.
deleted pvc of pod-1 and pod-2 , pvc contains /member directory where etcd related data is stored.
and then scaled down statefulset to replicas=0,
Finally scaled up service to replicas=1 , and as pod-1 was coming up , immediately went inside dced container and entered below commands at once
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list
etcdctl member list

Then etcd goes in such a bad state that its not able to recover at all, we tried everything but no luck.
We are attaching below the steps exactly performed and their output
reproduction step console log.txt

rahulbapumore · 2024-02-20T04:24:27Z

And this issue is reproducible always
below is statefulset
statefulset.txt

jmhbnz · 2024-02-20T08:22:00Z

And this issue is reproducible always below is statefulset statefulset.txt

Thanks for providing the sequence. Can you please provide the statefulset output as yaml so I can attempt to recreate it in my own cluster.

rahulbapumore · 2024-02-20T09:15:31Z

Hi @jmhbnz ,
We can not send entire code of it, because statefulset is having scripts which are responsible to manage entire etcd configuration and setting up etcd. And we cant send this code because of restrictions . Apologies for that.
But I can help you in understanding the sequence.
Whats happening is that we are starting etcd cluster in pod-0 and then adding members in pod-1 and pod-2 to join the cluster formed by pod-0 .
But then as we are having etcd configured as statefulset as part of testing, so directly if we scale down to replicas = 1, then what happens is that pod-0 etcd will be holding configuration data of 3 member etcd cluster, but in reality pod-1 and pod-2 are down, so consensus cant happen because out of 3 replicas only 1 is alive.
So later we came to know from etcd community only that before scaling down we should remove the pod-1 and pod-2 members, so same we have done. We have remove pod-2 and pod-1 member from etcd by using etcdctl member remove command and then we are scaling down to replica = 1 , and also we are deleting pod-1 and pod-2 pvc which have member directory holding etcd related configurational data.
At last we have pod-0 only, having 1 replica, everything works fine but suddenly when pod-0 is made to go down and when its coming up immediately we are running some of the etcdctl commands inside container and then it lands into this bad state which cant be recovered unless we delete member directory totally which will lose our data.

And we have also observed same issue when only single replica is kept for long time and suddenly this issue happens , all etcdctl commands stop working and cluster is unrecoverable .
So there is something wrong happening and some trigger point which we also dont know thats why reaching to etcd community.
This is happening in etcd 3.5.12 as well
100% reproducible.

rahulbapumore · 2024-02-22T08:18:44Z

@ahrtr
Any updates on this?

tjungblu · 2024-02-22T09:43:21Z

I'll try to repro this over the course of today, can you please give us the full config/helm chart for repro?
Or at least the YAMLs of the respective STS/service you're having.

The pod logs would be also helpful.

It also seems odd that you would configure the service for peer url advertisement?

  ETCD_INITIAL_ADVERTISE_PEER_URLS:      https://$(ETCD_NAME).dced-peer.namespace1.svc.cluster.local:2380

You see that from your continuous member listing, it's resolving the DNS to an IP that doesn't exist anymore at first, then it just does not even resolve any IP. Which makes me believe your pod is not actually picked up by the service anymore.

ahrtr · 2024-02-22T09:56:10Z

/assign @tjungblu

Thanks

rahulbapumore · 2024-02-22T14:31:42Z

Hi @tjungblu ,
I am attaching service, statefulset yaml files and pod-0 logs.
reproduction step console log.txt
service.txt
statefulset.txt

rahulbapumore · 2024-02-22T14:37:42Z

@tjungblu
Its resolving means etcd is resolving and keeping same ip stored in internal data structure right?
But one more thing, we have tried some of the recovery steps like

deleting pod multiple times
killing etcd service inside container and again starting manually
adding 2 members again by scaling up to replica=3 , still it doesnt work
again removing members and then scaling again down, doesnt work
deleting wal file and restarting pod-0, doesnt work

In short , this kind of state is not recoverable at all

rahulbapumore · 2024-02-22T14:44:55Z

@tjungblu
Just clarification on above comment.
etcdctl commands were not working then how did we added members and removed members
Actually when we deleted wal files and restarted pod then etcdctl get, etcdctl member list, etcdctl member remove, etc commands started working, but put was still blocked and cluster health was showing unhealthy.
In short after deleting wal files , we could perform recovery steps mentioned in above comment.
And still cluster was unhealthy and not recoverable at all.

tjungblu · 2024-02-22T15:01:17Z

Sorry @rahulbapumore but this is far away from being reproducible for me. What are all of those pieces?

      containers:
      - args:
        - --
        - /bin/bash
        - -c
        - /opt/redirect/stdout-redirect -config=/etc/stdout-redirect-config.yaml -format=json
          -redirect=stdout  -container=dced -service-id=dced-service
          -run="/usr/local/bin/scripts/entrypoint.sh"
        command:
        - /usr/bin/catatonit
....

      initContainers:
      - args:
        - -c
        - /opt/redirect/stdout-redirect -config=/etc/stdout-redirect-config.yaml -format=json
          -redirect=stdout -container=init -service-id=dced-service
          -run="/usr/local/bin/scripts/addMemberBootstrapDisabled.sh"

Please give us a minimal reproducible example with actual released etcd images and tools from this repository.

rahulbapumore · 2024-02-22T15:12:09Z

Hi @tjungblu
addMemberBootstrapDisabled.sh is startup script for init container , where etcd cluster is setup,

for pod-0 it does following things ->

First starts etcd process
enables auth by /usr/local/bin/etcdctl auth enable >/dev/null
and then exits

For pod-1 and pod-2, it does ->

it just adds memebr to existing cluster inside pod-0 by command -> /usr/local/bin/etcdctl member add ${ETCD_NAME} --peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} || /usr/local/bin/etcdctl member add --user root:${ETCD_ROOT_PASSWORD} ${ETCD_NAME} --peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS}

entrypoint.sh is startup script for dced container , where we keep running etcd process for all of the 3 pods

rahulbapumore · 2024-02-22T15:38:04Z

Hi @tjungblu ,
One more observation from my side, above state which is unrecoverable is
same as having N member etcd cluster and out of N more than N/2 members are down.
But we made members down after removing those members from cluster, and also trigger point is when 4-5 etcdctl commands are executed at once.
Other wise there is no issue.
We came up with this reproduction steps by ourself,
exact issue is that we are having one member etcd cluster and after working fine for few days, suddenly due to high traffic , cluster goes into this bad/unrecoverable state automatically without performing any steps.

And transport is closing is very generic message, which can come after any network disturbances or it can also come if etcd is not able to handle connection properly due to large number of request.
Atleast if we get recovery procedure from this state , it would be helpful.
Till now we have only one recovery step., which is deleting db file /member/snap/db -> which will cause data loss

tjungblu · 2024-02-22T15:55:27Z

Okay, I believe we're conflating many different issues here now.

same as having N member etcd cluster and out of N more than N/2 members are down.

That's the purpose of etcd, you can only guarantee correct writes when you have quorum. It's not surprising that your PUT fails if the majority of your configured cluster is down.

I also think that you can't just easily switch from a clustered etcd into a single-node experience with just removing members, you would need to reconfigure the environment to match it as well.

If you look at other examples on how etcd is ran as STS, they seem to handle those scenarios more gracefully:
https://gist.github.com/alaypatel07/92835ab398a342570a3940474590e825

exact issue is that we are having one member etcd cluster and after working fine for few days, suddenly due to high traffic , cluster goes into this bad/unrecoverable state automatically without performing any steps.

What is that bad / unrecoverable state? Looking at your PVC configs, I see storageClassName: network-block - what kind of network storage is this?

rahulbapumore · 2024-02-23T05:36:30Z

What is that bad / unrecoverable state? Looking at your PVC configs, I see storageClassName: network-block - what kind of network storage is this?

Bad state is same for which I have given reproduction steps ->
No etcdctl command works, command fails with context deadline exceed . When we delete wal files then some of the etcdctl commands work like etcdctl get/etcdctl member list/etcdctl endpoint status etc but still etcdctl put command is blocked.
etcdctl endpoint health command prints cluster is unhealthy.
and no recovery steps helps in coming out of this state.
Storage class is basic-csi(container storage interface)[This is from the environment where this issue is happening without doing anything, just signle replica of service is running and after few days commands start failing]

rahulbapumore · 2024-02-26T07:56:39Z

Hi @tjungblu
Any thoughts?

Thanks

rahulbapumore · 2024-02-28T06:20:26Z

Hi @ahrtr @tjungblu
Any info about above comment?

Thanks

rahulbapumore · 2024-03-01T06:16:45Z

Hi @tjungblu

rahulbapumore · 2024-03-04T05:30:06Z

Hi @jmhbnz @tjungblu @ahrtr
Any thoughts? Please do reply

rahulbapumore · 2024-03-06T12:49:59Z

Hi @ahrtr
Anything from your side?

dminca · 2024-10-25T13:23:31Z

I am facing the same issue.

Could it be that the no. of Pods pe node limit has been reached?

rahulbapumore added the type/bug label Feb 16, 2024

ahrtr added type/question and removed type/bug labels Feb 18, 2024

k8s-ci-robot assigned tjungblu Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD continuously printing "transport is closing" in the logs , and etcdctl commands failing with context deadline exceeded (request timeout error) #17438

ETCD continuously printing "transport is closing" in the logs , and etcdctl commands failing with context deadline exceeded (request timeout error) #17438

rahulbapumore commented Feb 16, 2024

rahulbapumore commented Feb 16, 2024

jmhbnz commented Feb 16, 2024 •

edited

Loading

rahulbapumore commented Feb 20, 2024

rahulbapumore commented Feb 20, 2024

jmhbnz commented Feb 20, 2024

rahulbapumore commented Feb 20, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

ahrtr commented Feb 22, 2024 •

edited

Loading

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

rahulbapumore commented Feb 23, 2024 •

edited

Loading

rahulbapumore commented Feb 26, 2024

rahulbapumore commented Feb 28, 2024

rahulbapumore commented Mar 1, 2024

rahulbapumore commented Mar 4, 2024

rahulbapumore commented Mar 6, 2024

dminca commented Oct 25, 2024

ETCD continuously printing "transport is closing" in the logs , and etcdctl commands failing with context deadline exceeded (request timeout error) #17438

ETCD continuously printing "transport is closing" in the logs , and etcdctl commands failing with context deadline exceeded (request timeout error) #17438

Comments

rahulbapumore commented Feb 16, 2024

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

rahulbapumore commented Feb 16, 2024

jmhbnz commented Feb 16, 2024 • edited Loading

rahulbapumore commented Feb 20, 2024

rahulbapumore commented Feb 20, 2024

jmhbnz commented Feb 20, 2024

rahulbapumore commented Feb 20, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

ahrtr commented Feb 22, 2024 • edited Loading

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

rahulbapumore commented Feb 22, 2024

tjungblu commented Feb 22, 2024

rahulbapumore commented Feb 23, 2024 • edited Loading

rahulbapumore commented Feb 26, 2024

rahulbapumore commented Feb 28, 2024

rahulbapumore commented Mar 1, 2024

rahulbapumore commented Mar 4, 2024

rahulbapumore commented Mar 6, 2024

dminca commented Oct 25, 2024

jmhbnz commented Feb 16, 2024 •

edited

Loading

ahrtr commented Feb 22, 2024 •

edited

Loading

rahulbapumore commented Feb 23, 2024 •

edited

Loading