Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP: Support namespace scoped implementations #567

Open
howardjohn opened this issue Mar 1, 2021 · 28 comments
Open

GEP: Support namespace scoped implementations #567

howardjohn opened this issue Mar 1, 2021 · 28 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@howardjohn
Copy link
Contributor

What would you like to be added:

Currently, someone deploying an implementation must have cluster scoped privileges to access GatewayClass. Often in multitenant clusters, teams are given access only to a set of namespaces without permissions to access cluster scoped resources.
Users in this scenario should be able to deploy an implementation of the API.

For example, they would deploy an nginx proxy controller, a LoadBalancer Service, a Gateway, and some routes.
This is really close to working, but they still need to create some GatewayClass.

Potential options

  1. Cluster admin creates generic GatewayClass with no params, say "in-cluster-proxy". NS admin creates a Gateway referencing it. They still need to configure parameters of the gateway (ie things that would typically exist in parametersRef of Gatewayclass), which they cannot do in GatewayClass. As a result, they need to externalize these configurations to somewhere else (annotations on the deployment/service/gateway, nginx-specific configmap, etc). This isn't great since now its entirely implementation specific how things are configured - but its also not too bad, as parametersRef is all implementation specific anyways. Still requires some cluster admin coordination

  2. Same as (1), but maybe they don't even bother with a generic GatewayClass and just put some bogus value there (it is a required field)

  3. Do not support deployments with only namespace permissions

@howardjohn howardjohn added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 1, 2021
@howardjohn
Copy link
Contributor Author

cc @costinm

howardjohn added a commit to howardjohn/istio that referenced this issue Mar 2, 2021
Fixes istio#29078

This enables users to select which gateway pods service a Gateway. This
is done via parameterRef to a configmap in GatewayClass.

I have some concerns about requiring GatewayClass to do this, opened
kubernetes-sigs/gateway-api#567 to track. We
can still change how we handle this, but for now we need some POC to
unblock testers
@robscott
Copy link
Member

We discussed this at the last community meeting. It sounded like there was significant interest in deployment models without the GatewayClass resource. Maybe something that could look like this:

  1. Each Gateway implementation can choose to support a set of static, domain-prefixed classes (such as gke.io/xlb). These would not require GatewayClass resources to exist.
  2. Users could specify those classes in the same way they specified any other class on a Gateway resource, there just wouldn't be a GatewayClass resource.
  3. GatewayClass resources matching these static names would be ignored by the implementation with the exception of potentially adding some kind of status indicating the conflict with the static class.

/cc @danehans @hbagdi @bowei

@SantoDE
Copy link
Contributor

SantoDE commented Mar 10, 2021

While I agree with you @robscott (for the need of namespace scoped installations), I feel the same issue is similiar to the normal IngressClass Ressource, no? As this is also a cluster scoped ressource. So maybe its possible to find a solution that works for both sides?

re: kubernetes/kubernetes#99824

I feel like, we should try to solve that all together? 🤷‍♂️

@howardjohn
Copy link
Contributor Author

Each Gateway implementation can choose to support a set of static, domain-prefixed classes (such as gke.io/xlb)

We had mentioned before the GatewayClass is optional, but if its added it will be used (to use parameters in the GWC). If we use this syntax I don't think its possible since you cannot have / in the name? I guess your third comment is conflicting with that, so maybe that was just my impression that wasn't shared by others. I think its useful to be able to overwrite it.

@robscott
Copy link
Member

Thanks @SantoDE for linking this to the upstream issue! I agree that whatever we should try to gain consensus in the broader sig-network community as well to ensure any change can help with Ingress as well as Gateway.

You're completely right @howardjohn, the example I gave of gke.io/xlb wouldn't work. My goal was to define some patter that would make conflicts unlikely or impossible. For example, it would be bad if 2 controllers both claimed the xlb class if there was no formal GatewayClass definition. Although we could use some kind of subdomain format, that seems rather unnatural. Maybe the format should simply be {implementation-name}-{variation}. So to follow up with my example above, gke-xlb. That seems rather unlikely to conflict with anything, but there are others that may be trickier. IE there is more than one nginx ingress implementation.

I think its useful to be able to overwrite it.

I'm not completely tied to the idea that GatewayClasses of these names would be invalid/ignored, but I still prefer it. If implementations were to provide standard classes with internally consistent meanings, it could get confusing if that class name meant something slightly different in another cluster. Maybe more significantly, if a GatewayClass resource could be used to override the meaning of one of these static class names, every controller would have to be able to read from the cluster-scoped GatewayClass API, which would be a significant hurdle for namespace-scoped deployments.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2021
@jpeach
Copy link
Contributor

jpeach commented Jul 1, 2021

/remove-lifecycle stale
/lifecyce frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 29, 2021
@hbagdi
Copy link
Contributor

hbagdi commented Sep 30, 2021

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 30, 2021
@robscott
Copy link
Member

robscott commented Aug 4, 2022

I think our best solution here would be to provide implementations with a pattern they can follow if they want to support single-namespace deployments. For example, implementations could provide a flag to indicate that they should only watch resources within the namespace they are deployed along with a flag to indicate which the class name they should implement. This class name should not overlap with any cluster-wide class names to avoid multiple implementations trying to implement the same resource.

@shaneutt shaneutt added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 5, 2022
@akutz
Copy link

akutz commented Sep 23, 2022

I think our best solution here would be to provide implementations with a pattern they can follow if they want to support single-namespace deployments. For example, implementations could provide a flag to indicate that they should only watch resources within the namespace they are deployed along with a flag to indicate which the class name they should implement. This class name should not overlap with any cluster-wide class names to avoid multiple implementations trying to implement the same resource.

Maybe we should revisit #136? It still strikes me as the most elegant solution. It is not about managing infra, but about pointing to infra within the multi-tenancy world in which we live on Kubernetes. Without a proper ClusterGatewayClass and GatewayClass, I argue everything else is a work-around to the most correct solution.

Regarding your proposal @robscott, I'm not sure how that is diffrent than what GWC used to have back in early days?

// AllowedGatewayNamespaces is a selector of namespaces that Gateways of
// this class can be created in. Implementations must not support Gateways
// when they are created in namespaces not specified by this field.
//
// Gateways that appear in namespaces not specified by this field must
// continue to be supported if they have already been provisioned. This must
// be indicated by the Gateway's presence in the ProvisionedGateways list in
// the status for this GatewayClass. If the status on a Gateway indicates
// that it has been provisioned but the Gateway does not appear in the
// ProvisionedGateways list on GatewayClass it must not be supported.
//
// When this field is unspecified (default) or an empty selector, Gateways
// in any namespace will be able to use this GatewayClass.
//
// Support: Core
//
// +optional
AllowedGatewayNamespaces metav1.LabelSelector `json:"allowedGatewayNamespaces,omitempty"`

It kind of feels like this has come full circle...

@youngnick
Copy link
Contributor

There are a few problems I can see with introducing a ClusterGatewayClass and associated GatewayClass namespace-scoped resource:

  • *Class names are, in the Kubernetes API, generally reserved for cluster-scoped things. IngressClass and StorageClass are both cluster scoped. So I'd probably be more in favor of keeping GatewayClass and introducing NamespacedGatewayClass or something.
  • Once we do have a namespaced reference to a gateway bucket (I'll go with NamespacedGatewayClass for the purposes of this argument), we have to determine how it works.
    • Do we add a namespace field to the spec.GatewayClassName of Gateway? This would mean moving this field to be a struct with a name and namespace reference, and using the convention that "" means "cluster-scoped". That convention is used elsewhere in the API, so that part is fine, but this is a big spec change.
    • Once we have a namespace reference, we also need to consider safe cross-namespace references, and use the ReferenceGrant-style functionality either by using ReferenceGrant directly, or adding an AllowedNamespaces field to NamespacedGatewayClass as @akutz suggests above. Again, this is a pretty significant departure.
    • If we allow this to go outside a single namespace, we're not saving much in terms of controller access, since the controller will still need cluster-level access to Gateways, NamespacedGatewayClass, etc.
    • Alternatively, we could not change the spec of Gateway at all, and just say "first check the same namespace as the Gateway for a NamespacedGatewayClass of that name, and if so, use that, and then all references must be within that namespace." I guess that's doable, but would mean that you can't do a set of namespaces. And that's a lot of magic that would not be directly visible from the objects themselves.

I guess I'm saying that I don't agree that ClusterGatewayClass is the more correct option, and changing to it at this point would be an api-breaking change that would require v1beta2. I don't think providing the functionality in the top comment of the issue is worth the months of work it would take to do a v1beta2. We would need to do a full pass through v1alpha3 or something, then move to v1beta2.

To put this another way, I don't think there's a way to make compatible changes to the API to get us to a NamespacedGatewayClass at this point, and I don't know if this is a big enough issue for the whole community to spend the effort to roll the apiVersion. (I acknowledge that's not very satisfying for people who need a resolution).

I think that the workarounds proposed about not requiring an actual GatewayClass object to be present are probably the best we're going to be able to do without a new apiVersion.

@shaneutt
Copy link
Member

@howardjohn does this represent a known customer need for Istio (or any other implementations you work on) or is it more theoretical? If this is more of a theoretical need given the difficulties I feel slightly inclined to close it in favor of waiting for a time where the needs of end-users can move it forward. In either case do we feel comfortable with saying we wouldn't need this for GA, as we should be able to use the ObjectName reference in the current GatewaySpec pretty open-endedly? Let me know your thoughts 🤔

@shaneutt
Copy link
Member

Good, makes sense to me but seeing active support for this gives me extra confidence in prioritizing it sooner rather than later so let's move this out of triage then and consider it something we'd like to solve for v1.0.0.

@shaneutt shaneutt added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Mar 15, 2023
@shaneutt shaneutt added this to the v1.0.0 milestone Mar 15, 2023
@shaneutt shaneutt moved this from Triage to Backlog in Gateway API: The Road to GA Mar 15, 2023
@shaneutt
Copy link
Member

Definitely still wanted by many implements and is a common use case, but doesn't need to hold up GA: we should be able to add this functionality in a post-GA world without too much hassle.

@shaneutt shaneutt removed this from the v1.0.0 milestone Jul 10, 2023
@shaneutt shaneutt removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 10, 2023
@shaneutt
Copy link
Member

It seems like this still has obvious benefits to the ecosystem, despite the lack of anyone to champion it and drive it forward. Let's remove the frozen lifecycle from it at least so it can remind us that it's hanging out back here, and we can continue to re-evaluate.

@shaneutt shaneutt removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 12, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2024
@costinm
Copy link

costinm commented Jun 11, 2024

One option for this (and perhaps few mesh cases) is to just make GatewayClass optional. It effectively is optional for GAMMA - there is no requirement to have a GatewayClass if parentRef of HttpRoute is Service, and clearly not the end of the world.

A Gateway object may still need to have a parentRef - we may want to relax this, but even if it remains required - I don't think we reject the creation of Gateway if GatewayClass is missing ( it may be installed later, eventual consistency - or not available in that cluster). An implementation that is namespace scoped will simply match on parentRef - even if there is no actual object. Not worse than mesh, where we don't even have something to match on.

Worst case - such an implementation would not pass the compatibility, which probably will be the case since most cross-namespace features will also not work. But in practical terms - it will solve the user problem.

Would be nice to have a pattern - where the parentRef is allowed to be namespace/name, with name pointing to some Service for example, like we did for GAMMA - the Service would be the one associated with the per-namespace gateway.

This can go a bit further: from 'namespace' gateway to 'workload gateway' ( also sometimes called 'sidecar'). A workload gateway would run as a container next to some other workloads - the parentRef and configs would use some equivalent of the workload selector - perhaps a Service that selects the workloads.

Nothing discussed here require real changes to the spec - just a bit of tolerance with implementations that may not fully support things that can't be supported in specific cases.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 11, 2024
@snorwin
Copy link
Contributor

snorwin commented Jul 15, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 15, 2024
@shaneutt
Copy link
Member

Hi @snorwin 👋

I see you've bumped the lifecycle on this, is this something you're interested in working on personally?

@snorwin
Copy link
Contributor

snorwin commented Jul 15, 2024

@shaneutt, thanks for reaching out.

I'm currently facing similar challenges with implementing the Gateway API for a security gateway managed by an operator that supports different installation modes (e.g., AllNamespaces, Multi-Namespace, OwnNamespace). The primary reason for these different modes is to enable multi-tenancy. While I'm aware that multi-tenancy can be achieved using multiple gateway classes, it seems that cluster-scoped resources are often considered red flags in shared or managed k8s clusters.

I appreciate @costinm's suggestion of "headless" gateways and believe this issue should move in that direction. For this reason, I have removed the rotten lifecycle label. I'm also happy to contribute to this topic.

@shaneutt
Copy link
Member

Cool, thanks for the insights! We can definitely provide support if you'd like to try and help move this one forward, maybe a good next step would be to add something to the agenda for one of our upcoming community meetings so we can just generally discuss the topic, please feel free to add something there and let us know what you're thinking!

@youngnick
Copy link
Contributor

youngnick commented Jul 22, 2024

@snorwin, if you could drop the use cases you're interested in here, then we can talk some more about this.

To reiterate what I said in the community meeting today - my main argument here is this:

Controllers that cross multiple namespaces (which most Gateway API controllers do), necessarily run at cluster scope, because there is no API construct that is a larger scope than a single namespace, but smaller scope than cluster scope.

So, every workaround that we do to make it so that you don't need to create a cluster-scoped API construct is building in confusion - because you now have a larger-than-namespace-scoped thing (the set of namespaces the controller needs to watch) that has no representation in the API (because there is no place to store a set of namespaces that's outside a single namespace).

As it stands, if we use a workaround like a specific name for a GatewayClass or similar, all that means is that the state of Gateway API objects is no longer entirely described by API objects. There's some extra information that you need to have that you cannot retrieve from the Kubernetes API itself (for example, you're running in GKE and have a standard set of GKE classes available to you using magic strings).

The requirement that a system of Gateway API resources can be entirely understood by looking only at Kubernetes API objects is a bit implicit, I guess, but it's always been there for me - it's a reaction to the Ingress domain-name-claiming problem where a namespaced object (an Ingress) claims a cluster scoped resource ( a hostname ). That has really bad effects, like being able to steal another services' traffic if you are a bad actor. I really want to avoid that sort of thing happening for Gateway API.

None of this is to say that I don't think we should continue talking about this, I just think that we need to be really careful that we're not creating a silent problem that will mess us up later.

@snorwin
Copy link
Contributor

snorwin commented Jul 23, 2024

@youngnick thank you for your thoughts and input. I will try to illustrate the use case I have in mind in more detail:

Ana, an application developer, wants to use the Gateway API to manage the connectivity of her micro-service application. She works in the financial and insurance services industry, which still operates most of its infrastructure in on-premise data centers (private cloud). In this environment, Kubernetes is often provided as large shared clusters (also known as Namespace as a Service) managed by an external provider, where Ian and Cherio work.
The Gateway API is already installed on the cluster, but Ana wants to deploy her own Gateway infrastructure self-contained within her application namespace. This is due to the lack of flexibility and the operational overhead of coordinating every change with Ian and Cherio.

Self-contained, as mentioned above, means that all the Gateway API resources (e.g., Gateway, HTTPRoute), as well as the controller running with namespaced RBAC (i.e., Roles and RoleBindings), and the resources generated, are all within a single namespace.

As highlighted in the community meeting, my preferred solution would be to make the .spec.gatewayClassName optional - why? This explicitly indicates that the Gateway has no GatewayClass and is managed within a namespaced scope. In contrast, using gatewayClassName: none offers no guarantee that a GatewayClass named none won't be created in the future, potentially allowing another controller to take over this Gateway. To prevent clashes between different Gateway controllers within a namespace, e.g., leader election could be used.
A Gateway without a GatewayClass would neither solve nor worsen the domain delegation issue because, as far as I know, most controllers don't guarantee the uniqueness of hostnames across all Gateways within a GatewayClass.

So far, I have only described the self-contained namespace use case. While true multi-namespace tenancy doesn't exist, the illusion of it can be created using namespaced or cluster-scoped approaches, such as in combination with admission policies. I agree that it is not feasible to define a clear solution for multi-namespace approaches using the Gateway API, and it is also unnecessary to provide a guidance for an anti-pattern.

@shaneutt
Copy link
Member

I'm uncomfortable with removing specification in order to suite this need 🤔

That said, what we really need right now is for someone to be assigned to this issue, and to start a GEP: organizing some brainstorming sessions as needed, and otherwise writing up all the options discussed so far, and any new ones we can come up with so we can discuss the pros and cons of various approaches. If anyone wants to be assigned to this and is ready to push it forward, please say so and we'll be happy to support you in your efforts.

@shaneutt shaneutt changed the title Support namespace scoped implementations GEP: Support namespace scoped implementations Aug 16, 2024
@shaneutt shaneutt added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.