-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Create Position around Hub Cluster (prev. management Cluster) #8210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
f383444
90a99eb
d0fbe3d
f5dc455
74b2d90
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,46 @@ | ||||||
# Management Cluster - SIG Multicluster Position Statement | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not all multi-cluster management designs have a monolith. I do not think this SIG should take a position that requires or recommends a monolith. There are multiple roles involved in multi-cluster management. I think that it may be more helpful to identify those roles. A system with a monolith can be described as having one cluster that plays all those roles. For example, in https://github.com/kubestellar/kubestellar/ we identify distinct roles and allow flexibility in what plays which role. We start by identifying a concept that is less than a "cluster". We define a "space" to be the fragment of cluster behavior that is only concerned with generic API machinery stuff. A space can store and serve kube API objects and subjects them to the general-purpose controllers (the ones that apply to all kinds of API objects, not the controllers involved specifically with containerized workload (Pod, Service, ...) ). KubeStellar defines the following roles.
One configuration that KubeStellar supports is one real cluster playing the roles of WDS, ITS, and KubeFlex hosting cluster. In OCM, where the workload description is wrapped and the workload execution cluster holds the unwrapped objects, could one cluster play both WDS and WEC roles? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for those details. I agree with your statement of not being a monolith. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder what is Kubestellar recommended way to host WDS/ITS/Kubeflex? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: If the Inventory Space and the Workload Description Space are truly separate, how can we define an API that allows to schedule a workload on the clusters in the inventory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MikeSpreitzer I think those are questions for you. Let me know if I can help.
corentone marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Author: Corentin Debains (**[@corentone](https://github.com/corentone)**), Google | ||||||
Last Edit: 2024/12/09 | ||||||
Status: DRAFT | ||||||
|
||||||
## Goal | ||||||
To establish a standard definition for a central cluster that is leveraged by multicluster | ||||||
controllers to manage multicluster applications or features across an inventory of clusters. | ||||||
|
||||||
## Context | ||||||
Multicluster controllers have always needed a place to run. This may happen in external | ||||||
proprietary control-planes but for more generic platforms, it has been natural for the | ||||||
Kubernetes community to leverage a Kubernetes Cluster and the existing api-machinery | ||||||
available. There has been a variety of examples of which we can quote ArgoCD, MultiKueue | ||||||
or any of the Federation effort (Karmada, KubeAdmiral), all of them not-naming the "location" | ||||||
where they run or not aligning on the name (Admin cluster, Hub Cluster, Manager Cluster...). | ||||||
corentone marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
The [ClusterInventory](https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/4322-cluster-inventory/README.md) | ||||||
(ClusterProfile CRDs) is also the starting point for a lot of multicluster controllers and, | ||||||
being a CRD, it requires an api-machinery to host it. | ||||||
|
||||||
## Definition | ||||||
|
||||||
A (multicluster) management cluster is a Kubernetes cluster that acts as a | ||||||
control-plane for other Kubernetes clusters (named Workload Clusters to differentiate | ||||||
them). It MUST have visibility over the available clusters and MAY have administrative | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder what does "visibility" mean here? There can be workload clusters that do not have public IP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, visibility is vague. I think it is somewhat the ability to query more data about the cluster. I'm thinking it would be the ability to "READ" via kubectl (thought it may not be everything). But it should be able to potentially see the Cluster object in GKE or other platforms (wherever the provisioning/lifecycle happens). Maybe this is too vague, I originally had this in to mandate some kind of visibility on the cluster so that the controller could do something (if the controller has only access to the clusterprofile, I dont know what it can really do). But we may be able to remove it and it wouldn't affect the definition much. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that MUST is correct then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @serbrech good point; the must is likely too strong then. Would a "should" be okay? Or should it be a "may"? Now, even if we reverse the security model (e.g. work api model where the cluster reaches out the central cluster), there is still power from the hub over the workload cluster given it. Here is what I could change:
Suggested change
(I could also drop the clusterprofile part as obvious) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for revisiting. I'm aligned with MAY. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my systems, the workload clusters pull from the management cluster. Controllers run on the workload clusters so MAY have administrative privileges works well in my world. Visible is interesting for my world; my management clusters are authoritative for the data and the workload clusters reconcile against it. But, there MAY be two management clusters from which a workload cluster pulls (i.e. a dev and prod source-of-truth); each management cluster authoritative for particular data. Lastly, could visibility be decomposed into data storage about a workload cluster and mechanism for transport/reconciliation between the hub/workload cluster? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've removed visibility to be "access to api, metrics or workloads". So it could be that it doesn't access any of those or a subset of them. @cbaenziger I'm not sure what you mean about data storage for visibility, apiserver/etcd?
we should talk more about this example. There could be multiple multiple management clusters for a given cluster, and the definition allows for it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I agree with definition as written now. To clarify my operation:
Data: My hub clusters store data in their etcd (kine based) about the workload clusters. However, they are not responsible for propagating that configuration data to the workload clusters. Transport: The workload clusters run a controller which polls (an interposing service backed by the) hub cluster and synchronizes specific resources to the workload cluster. The hub cluster is not responsible for the propagation of the state it stores. As to multiple hub clusters: Since workload clusters are responsible for synchronizing their resources from the hub clusters, one hub cluster could be responsible for storing a resource that drives namespace definitions. Another could be responsible for storing RBAC policy definitions. Or as mentioned, one could hold dev resources and prod resources -- I fully agree with your operator perspective that no workload cluster ought to sync from both a dev and prod system -- merely that each is authoritative for its tranche of data; see the namespace vs RBAC split for a less broken operational example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cbaenziger thank you for the details. Indeed your model is similar to the WorkAPI (where there is an agent running in the workload cluster pulling data from the hub cluster). One thing I'd note is that even if you "pull" the data from the hub cluster, the hub cluster still holds permissions (although indirectly) on the remote clusters since it could technically go wild and "tell" the workload-cluster-pullers that they need to run compromised code. It wouldn't be able to DDOS the apiserver but if the puller pulled RBAC and not restricted to a few namespaces, it could escalate privileges easily. |
||||||
privileges over them. It SHOULD not be part of workload clusters to provide a better | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not convinced of this "SHOULD". I am not convinced that this sort of statement belongs in a definition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if you are not convinced that the "hub" cluster and a "spoke" cluster should not be the same or just "should" does not belong to a "definition"? I am curious about the reason if it's the former. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MikeSpreitzer could you clarify your statement please? If you mean that the management cluster relationship to workload clusters should be different than what the definition says, could you clarify your position (they MUST isolated, the definition shouldn't talk about their relationship or the SHOULD is too strong and it should be "Management cluster MAY also be a workload cluster") |
||||||
security isolation, especially when it has any administrative privileges over them. | ||||||
There MAY be multiple management clusters overseeing the same set of Workload Clusters | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This statement supposes multiplicity only in the form of potentially competing equals; it omits the possibility of clusters fulfilling distinct roles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to clarify in the next section that it could be multiple roles:
is the wording not strong enough? I don't mean to close that door and thought the current wording was enough and not emitting a direction, just requiring that the admin oversees potential overlap between different management clusters. If there is no overlap, they are fine to co-exist as separate clusters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MikeSpreitzer Did that answer your comment (e.g. can I resolve). I'm not sure I fully understood it and tried my best to answer it. Please let me know if I missed it. |
||||||
and it is left to the administrator to guarantee that they don't compete in their | ||||||
management tasks. There SHOULD be a single clusterset managed by a management cluster. | ||||||
corentone marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
Management clusters can be used for both control-plane or data-plane features. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be helpful to say what is meant here by "control-plane" and "data-plane". In the call on Tuesday I heard something like the data plane is about the workload and the control plane is about controlling the workload's propagation from hub to execution clusters. (I specifically say "execution" clusters to contrast with where the authoritative description of the workload lives in the hub, which might be in a cluster or something like it. In KubeStellar we keep the hub-side authoritative description of the workload un-wrapped in what we call a "space", which something that has (at least) a cluster's API machinery.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to clarify by differentiating "Business applications" with "Infrastructure". Side notes that I may add to the definition/doc if that helps and the current wording doesn't cover it: There are two types of Workloads: Business/Product/application vs Infrastructure/Platform. I consider workload clusters as running (e.g. pods serving traffic or achieving functionality) business applications. Workload clusters may also run infrastructure applications that assist them in serving. |
||||||
|
||||||
|
||||||
### Rationale on some specific points of the definition | ||||||
|
||||||
* Multiple management clusters: While it often makes sense to have a single "Brain" overseeing | ||||||
a Fleet of Clusters, there is a need for flexibility over the number of management clusters. To | ||||||
allow redundancy to improve reliability, to allow sharding of responsibility (for regionalized | ||||||
controllers), to allow for separation of functionality (security-enforcer management cluster vs | ||||||
config-delivery management cluster), to allow for migrations (from old management cluster to new | ||||||
management cluster) and likely more. | ||||||
* Management cluster also being part of the workload-running Fleet: We do recommend that the | ||||||
management cluster(s) be isolated from the running Workload Fleet for security and management | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense in general, but I am not convinced that there are no use cases for combining roles in one cluster. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the first paragraph encourages different clusters for different roles actually. Let me try to think of introducing the notion of "role" or something like that, as a subdivision of the broad Management Cluster. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder what's the definition of "workload"? I usually associate them with applications but not controllers so it's okey to me to run controllers in the "central" cluster that requires leader-election. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is definitely okay to run a controller in the management cluster. A controller doesn't have to be a "Workload" and be considered part of the "control-plane". I think its the persona that matters. If it is a platform admin -owned controller, performing management tasks, I wouldn't consider it a workload. Workload to me is an application serving actual business-logic purpose. All in all, running those management controllers is the reason why I want to define management clusters and not just a management API. (we had discussed internally giving simply an API with machinery... but then very quickly you want to bring a controller to act on this API and look where to run it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ryanzhang-oss should I define workload more? I think a sentence in this doc may be enough. Workload works great when opposing management, for hub maybe we need to say spoke? or is workload still good? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, workload in k8s basically means things like deployment/daemonset/statefulset, so it can be anything. I am not sure if there is a way to say the deployment contains a controller vs a real application. |
||||||
concerns. But there may be specific cases or applications that require to mix the two. For example, | ||||||
controllers that take a "leader-election" approach and want a smaller footprint. |
Uh oh!
There was an error while loading. Please reload this page.