Add automatic disk scaling based on disk usage#27
Merged
Conversation
Users no longer need to set a disk size. The operator provisions data volumes at 1Gi (or spec.storage's request, which becomes the initial/minimum size), measures disk usage of every pod once a minute by running df against the data mount, and when the fullest volume in the cluster exceeds 50% used, grows the target size for all volumes by 50%, rounded up to a whole Gi. The current target is tracked in status.storageSize; spec is never mutated and volumes only ever grow. - spec.storage is now optional; defaults to the cluster default storage class with ReadWriteOnce access - new optional spec.storageLimit caps growth; when reached (or when the StorageClass does not allow volume expansion) the operator emits a warning event and sets the StorageLimited status condition - another expansion is only requested once every PVC has reached the current target capacity, so growth cannot compound while an expansion is in flight (EBS allows one modification per volume per ~6h) - stable clusters requeue every minute to keep monitoring usage
jzho987
reviewed
Jun 10, 2026
jzho987
left a comment
There was a problem hiding this comment.
How should down scaling work? Sounds like we just don't care about handling scaling down? Which is fine.
Member
Author
|
@jzho987 I think we should never scale down, it will keep it quite simple and safe. |
The e2e "enable auth" test failed on a host with >50% disk usage: with the local-path provisioner, df inside the pods reports the node filesystem, so every reconcile crossed the threshold, attempted to grow, hit the non-expandable StorageClass, and rewrote the StorageLimited condition whose message embedded the fluctuating usage percentage. Each status write triggered another reconcile, adding exec and API churn during the auth rolling update. - check StorageClass expandability before measuring: non-expandable clusters get one static StorageLimited condition and no df execs - drop the usage percentage from condition messages so SetStatusCondition only fires on real transitions (the percentage stays in the one-shot expansion event) - clear StorageLimitReached with a Normal event once usage recedes below the threshold Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- MaxConcurrentReconciles 2 -> 8 so one slow cluster cannot stall reconciliation of the others now that stable clusters requeue periodically for disk monitoring - raise client-go rate limits to 50 QPS / 100 burst; the reconciler fans out exec, valkey and status calls across every pod of every cluster and the controller-runtime default of 20/30 throttles it - lengthen the disk usage poll interval from 1 to 5 minutes; the 50% threshold with 50% growth steps leaves enough headroom that a five-minute detection latency is safe Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Enabling a password on a running cluster deadlocked the rolling update,
deterministically: a restarted replica sends AUTH (from primaryauth) to
a primary that has not restarted yet and so has no requirepass; valkey
treats that as a fatal replication error ("Unable to AUTH to PRIMARY"),
the replica's master link stays down, and the health-gated rolling
update never proceeds. This is why the "enable auth" e2e test has been
failing on main since the replication catch-up health gate was added.
The operator now live-applies requirepass/primaryauth via CONFIG SET to
every running pod before the rolling restart, so replication stays
authenticated in every mixed-config state and the restart only makes
the config persistent. Password removal is handled by the same path.
Also fix two races in the upgrade e2e test, reproduced against a live
cluster with a 2s GET watch (the key never disappeared; the lone empty
read happened the moment the exec-target pod was being replaced):
- cluster state and version are now verified at the same instant;
previously state could pass before the last pod was replaced and
version while it was terminating, declaring the rollout done early
- the data check is retried; genuinely lost data never recovers, so
retrying cannot mask real loss
- give the auth rollout (six health-gated pod restarts) 10 minutes
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Users no longer need to set a disk size. The operator provisions data
volumes at 1Gi (or spec.storage's request, which becomes the
initial/minimum size), measures disk usage of every pod once a minute
by running df against the data mount, and when the fullest volume in
the cluster exceeds 50% used, grows the target size for all volumes by
50%, rounded up to a whole Gi. The current target is tracked in
status.storageSize; spec is never mutated and volumes only ever grow.
class with ReadWriteOnce access
StorageClass does not allow volume expansion) the operator emits a
warning event and sets the StorageLimited status condition
current target capacity, so growth cannot compound while an expansion
is in flight (EBS allows one modification per volume per ~6h)