-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator scale-up loop with recreate strategy #1105
Comments
Hi, @ddellarocca Thanks for feedback, I'm in Chinese New Year Holidays, I plan check this after 5th Feb. |
Hi @Rory-Z, no worries we encoutered it in production but still we managed to do what we needed to do, it's ok to wait |
Hi @ddellarocca I'm sorry I didn't repeat this issue in my local cluster, in emqx-operator/controllers/apps/v2beta1/update_emqx_status.go Lines 82 to 102 in 59bb002
CurrentReplicas is right. And check your log file, the max number of the emqx-core-7c6fbb448d replicas is 93, it's more than the sum of the two StatefulSets.
Could you please running |
Hi @Rory-Z, thanks for the check. On my side, I am not able to find the Regarding the 93 replicas, yes, it is more than the sum, but if you look at the log file, it got there gradually. I did miss to mention that those API calls to the API server were filtered by the service account used by the operator, so it was actually that which scaled up the cluster. Is there anything that I could check? |
I have no idea, I checked code again, and there is no other code to change the replicas of the statefulSet except emqx-operator/controllers/apps/v2beta1/update_emqx_status.go Lines 82 to 102 in 59bb002
|
I could try to test it out next week but not in the cluster we faced the issue the first time, would that work out? |
Thanks a lot, please try https://github.com/emqx/emqx-operator/releases/tag/2.2.29-beta.1 |
Describe the bug
During a deployment with the update strategy set to recreate in a 28-node cluster, the operator began increasing the number of replicas on the old StatefulSet instead of deleting it.
The update had a dual purpose: scaling up and changing a few settings.
We first scaled up (from 9:00:58 to 9:01:43), then changed the configurations around 9:05, which triggered the creation of a new StatefulSet (old being
emqx-core-7c6fbb448d
and newemqx-core-778c694444
).However, once the new StatefulSet was ready and nodes joined the cluster, the operator started increasing the replicas of the old StatefulSet (API server audit log events attached logs-insights-results.csv),
kubectl get statefulset emqx-core-7c6fbb448d
show 0 ready replicas and the requested were increasing.As a workaround we scaled to 0 the operator and then scaled to 0 the old StatefulSet.
To Reproduce
We did not manage to reproduce it in non production environments.
Expected behavior
Instead of increasing the number of instances it should delete the old StatefulSet
Anything else we need to know?:
By looking at the code we think that the way the operator calculate the number of replicas during scale down is wrong, here is the part where it calculates the number of replicas
emqx-operator/controllers/apps/v2beta1/sync_pods.go
Line 95 in 59bb002
emqx-operator/controllers/apps/v2beta1/update_emqx_status.go
Lines 82 to 102 in 59bb002
the number of running replicas is the sum of the two StatefulSets.
This creates a scale up instead of scale down.
Environment details::
The text was updated successfully, but these errors were encountered: