etcdserver: read-only range request ... took too long (...) to execute #2692

erichorwath · 2022-03-24T08:13:34Z

What happened:
Slow etcd in Kind control-plane slowing down complete helm instal / kubectl apply.
Etcd logs contain lots of errors of such type:

2022-03-16 12:20:25.445202 W | etcdserver: read-only range request "key:\"/registry/helm.toolkit.fluxcd.io/helmreleases/istio-system/istio\" " with result "range_response_count:0 size:5" took too long (1.088797682s) to execute

What you expected to happen:
No etcd errors

How to reproduce it (as minimally and precisely as possible):
Startup Kind, check the etcd pod logs.

Anything else we need to know?:
We could reproduce this behavior in all our scenarios, which are mainly two:

running on my 32GB laptop (in my case WSL2, see below)
running inside a Kubernetes cluster, in a Docker-in-Docker-pod (see also document how to run kind in a kubernetes pod #303 ), even when using the fasted available VMs and biggest premiumSSDs the hyperscaler offers, we still see the same error. For me, it doesn't look like a DinD issue, but a problem related to Kind.

Environment:

kind version: tested on v0.12.0 and v0.11.1
Kubernetes version: node version: kindest/node:v1.21.1@sha256:69860bda5563ac81e3c0057d654b5253219618a22ec3a346306239bba8cfa1a6 and kindest/node:v1.21.10@sha256:84709f09756ba4f863769bdcabe5edafc2ada72d3c8c44d6515fc581b66b029c
Docker version: (use docker info): Docker 20.10.11 (WSL2) and docker:20.10.12-dind (DinD)
OS (e.g. from /etc/os-release): tested on WSL2-Ubuntu20.04LTS and Alpine Linux v3.15 (DinD)

Workaround:
Run etcd in memory (tmpfs) resolves the etcd error messages and helm install timeouts:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  etcd:
    local:
      dataDir: /tmp/etcd

See #845 (comment) or knative-extensions/net-contour#444

The text was updated successfully, but these errors were encountered:

aojea · 2022-03-24T08:23:59Z

Why do you say is kind?
etcd needs to fsync to disk and is very iops intensive, now think on all the filesystem layers between etcd and the disk in a setup with docker, dind and wsl2 ... I see that this is a "setup" issue

erichorwath · 2022-03-24T09:02:57Z

let me phrase it differently: I haven't found a (hardware/os) "setup" where kind is not having those etcd issues.

Maybe Kind itself is not the problem but maybe of how Kind is using Docker, but I'm not an expert here...

aojea · 2022-03-24T10:26:19Z

let me phrase it differently: I haven't found a (hardware/os) "setup" where kind is not having those etcd issues.

Maybe Kind itself is not the problem but maybe of how Kind is using Docker, but I'm not an expert here...

is a setup problem, docker uses overlay2 as default storage driver , this can illustrate the problem better
https://www.storageconference.us/2017/Presentations/PerformanceAnalysisOfContainerizedApplications-slides.pdf

erichorwath · 2022-03-24T10:34:18Z

And why not creating a data volume for etcd when kind creates nodes?

aojea · 2022-03-24T10:49:11Z

heh, it seems it already runs in a volume, I was assuming it wasn't (EDIT: lol, my memory does not work well, I can see in the issue I reported about etcd performance I mentioned it XD)

  58213 ?        Ssl   42:08  \_ etcd --advertise-client-urls=https://172.18.0.3:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://172.18.0.3:2380 --initial-clu

                "Type": "volume",                                                                                                                                                                                                                         
                "Name": "3a17927a7e69565f6dd30372f1211892cb1fc12e85ab91bc59f18924a6657244",                                                                                                                                                               
                "Source": "/home/var/docker/volumes/3a17927a7e69565f6dd30372f1211892cb1fc12e85ab91bc59f18924a6657244/_data",                                                                                                                              
                "Destination": "/var",                                                                                                                                                                                                                    
                "Driver": "local",                                                                                                                                                                                                                        
                "Mode": "",                                                                                                                                                                                                                               
                "RW": true,                                                                                                                                                                                                                               
                "Propagation": ""                                                                                                                                                                                                                         
            }

I think that this may require some benchmarks and debugging then, to understand the bottleneck

BenTheElder · 2022-03-24T20:19:32Z

let me phrase it differently: I haven't found a (hardware/os) "setup" where kind is not having those etcd issues.

We generally have the opposite problem, a few users have reported issues, but only on slow spinning disks ...

There's only so much this project can do, kind ships upstream kubernetes/etcd, the performance of etcd + how kubernetes uses it largely come from those projects.

And why not creating a data volume for etcd when kind creates nodes?

kind is.

Regarding tmpfs:

#845 discusses an option users can leverage with tmpfs, but doing so is not something we can do by default (due to lack of data durability ...), though possible to configure with existing features as you found.

erichorwath · 2022-03-25T06:43:22Z

JFYI, we have saw during kind create cluster && helm install a max load of ~ 1k IOPS

In this example we used 1TB Azure PremiumSSD with 5k IOPS (30k burst).

BenTheElder · 2023-04-18T04:11:47Z

Some of that may be docker pulling / unpacking the image.

I don't think there's much more we can do here. It's already possible to put Etcd on a tmpfs with kind with the significant trade offs that causes. We use a docker volume otherwise and I don't think there's a faster option we could implement

erichorwath added the kind/bug Categorizes issue or PR as related to a bug. label Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: read-only range request ... took too long (...) to execute #2692

etcdserver: read-only range request ... took too long (...) to execute #2692

erichorwath commented Mar 24, 2022 •

edited

Loading

aojea commented Mar 24, 2022

erichorwath commented Mar 24, 2022

aojea commented Mar 24, 2022

erichorwath commented Mar 24, 2022 •

edited

Loading

aojea commented Mar 24, 2022 •

edited

Loading

BenTheElder commented Mar 24, 2022 •

edited

Loading

erichorwath commented Mar 25, 2022

BenTheElder commented Apr 18, 2023 •

edited

Loading

etcdserver: read-only range request ... took too long (...) to execute #2692

etcdserver: read-only range request ... took too long (...) to execute #2692

Comments

erichorwath commented Mar 24, 2022 • edited Loading

aojea commented Mar 24, 2022

erichorwath commented Mar 24, 2022

aojea commented Mar 24, 2022

erichorwath commented Mar 24, 2022 • edited Loading

aojea commented Mar 24, 2022 • edited Loading

BenTheElder commented Mar 24, 2022 • edited Loading

erichorwath commented Mar 25, 2022

BenTheElder commented Apr 18, 2023 • edited Loading

erichorwath commented Mar 24, 2022 •

edited

Loading

erichorwath commented Mar 24, 2022 •

edited

Loading

aojea commented Mar 24, 2022 •

edited

Loading

BenTheElder commented Mar 24, 2022 •

edited

Loading

BenTheElder commented Apr 18, 2023 •

edited

Loading