GEP-1762: In Cluster Gateway Deployments #1757

howardjohn · 2023-02-23T22:42:48Z

Review note: This GEP was split into 3 - the current GEP (focusing on in-cluster deployments), #1868 (infrastructure field we depend on), and https://github.com/kubernetes-sigs/gateway-api/pull/1863/files (gateway merging)

What type of PR is this?

/kind gep

What this PR does / why we need it:

This PR attempts to inject some opinions about how in-cluster deployments should work. Currently, there is a lot of different implementations behaving differently, leading to confusion and drifting user experiences.

As-is, this PR is fairly opinionated. I expect that some of the MUSTs become SHOULDs as we iterate on this PR. In its current state, this is largely meant as a discussion point. I expect there to be a lot of iteration as we learn different implementations requirements and perspectives.

Note: most of the names were just picked arbitrarily. I'd like to first focus on the concepts, then we can debate the best names (almost certainly not what the initial GEP has!)

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

k8s-ci-robot · 2023-02-23T22:42:50Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

robscott

Thanks for the work on this @howardjohn!

site-src/geps/gep-1757.md

howardjohn · 2023-03-02T00:36:09Z

site-src/geps/gep-1762.md

+  This MUST be derived from the referenced `Spec.Address`.
+* MUST not deploy any resources into the cluster; it is expected that a user will do these actions.
+
+### Merging Gateways


Note: this probably needs to be broken out, it applies to all implementation types. However, given the infrastructure field is shared with the rest of the GEP. I intent to keep it in here for the initial discussions to keep conversation focused

Yeah, it seems like this would be right at home in a GEP about infrastructure by itself, as another use case to justify including the new struct.

sunjayBhatia · 2023-03-02T18:38:46Z

site-src/geps/gep-1762.md

+  name: merged-gateway
+spec:
+  infrastructure:
+    attachTo:


this primary Gateway idea is interesting, makes some of the precedence more explicit when merging which is great (relying on the the general conflict resolution guidelines where the resource that comes "first" will still apply in some cases it seems)

I'm a bit concerned with this as a requirement - allowing implementations freedom to merge as needed is important.

A multi-tenant, external gateway implementation may combine all the gateways of type http with a valid domain name. Even a single-in-cluster gateway could do the same. Note that implementations are not bound by namespace or even cluster or project/owner boundaries.

There is a separate problem of 'too many resource to fit in a single yaml' - but creating 2 Gateways will result in 2 IPs and deployments. A simple naming pattern ( same prefix ) or common label may work too, infrastructure.attachTo seems a bit heavy.

Or even better - if you add 'attachTo. Service' in the section above, having 2 Gateways attach to the same Service will be a clear signal that they need to be merged ( or an error condition if they must explicitly attach to Gateway ).

Yes, the idea behind attachTo is to allow multiple Gateways to attach to something that provides the address information, which could be another Gateway or a Service (presumably that would be manually configured). I have some concerns, but I think I just need to think through the use case, and on balance think it's pretty reasonable.

sunjayBhatia · 2023-03-02T18:46:30Z

site-src/geps/gep-1762.md

+With any in-cluster deployment, customization requirements will arise. 
+
+Some common requirements would be:
+* `Service.spec.type`, to control whether a service is a `ClusterIP` or `LoadBalancer`.


in Contour we've made this configurable via GatewayClass params but as you say in other places it is a bit unwieldy, esp. as you add in the desire to configure a particular Service port and NodePort which should be done at the Gateway level

I'd expect that in the case we add infrastructure, that any existing GatewayClass paramsRef config that overlaps would be treated as an implementation-wide default, that could be overridden by the more-specific Gateway setting.

site-src/geps/gep-1762.md

costinm · 2023-03-03T17:01:31Z

site-src/geps/gep-1762.md

+
+With this configuration, an implementation:
+* MUST mark the Gateway as Ready and provide an address in `Status.Addresses` where the Gateway can be reached on each configured port.
+* MUST label all generated resources (Service, Deployment, etc) with `gateway.networking.k8s.io/metadata.name: my-gateway` (where `my-gateway` is the name of the Gateway resource).


Isn't there already a pattern for 'managed-by' ? I don't think it's unique to gateways to create associated resources.

there is ownerRef but, IMO, its useful to have a label since a lot of things need label selector (HPA, for example). Plus deployment itself needs a label selector to match on Pods, as well as Service

But we'll still have ownerRef too ?

I don't mind having a common label - it would also work for the other case in this doc ( merging gateways, since the 'other' gateways are also associated with the same gateway ).

This implies that the generated resources reside in the same namespace as the Gateway. Should the value be ns/name to support potential use cases for the Gateway and generated resources residing in different namespaces?

We can't cross namespace boundary easily ( i.e. a setting in ns1 shouldn't be able to influence something in ns2 ).

We certainly need to support gateways running outside of the namespace (or cluster) - but with care.

Yes, I think that we should not allow for namespace-crossing with the deployment without a very strong use case. As we've seen from the issues that led to ReferenceGrant, allowing cross-namespace references can have large unintended consequences. I think we should leave this as "same namespace as Gateway" unless there's a very strong use case otherwise.

I'm not opposed to restricting to the same ns as a Gateway. My thinking is to start out permissive, see if any use cases arise, and then become more restrictive.

One lesson I've learned doing this for a while now, is it's easier to start restrictive and carve out exceptions if they're necessary - the opposite is suprisingly difficult! That's why I'd prefer to rule out namespace-crossing without a strong use case - we can carve out exceptions later.

+1 to have ownerref as well, it is easy to be gced when gateway is deleted

costinm · 2023-03-03T17:06:22Z

site-src/geps/gep-1762.md

+```
+
+With this configuration, an implementation:
+* MUST mark the Gateway as Ready and provide an address in `Status.Addresses` where the Gateway can be reached.


Can the user manually creating the Gateway just populate spec.Address or status.Address ? Would work for external LBs too.

But I'm not sure we should be prescriptive about 'self deployed' or 'external' - better to focus on the 'auto-deployed'. There are many other valid options - could be a DNS name for example.

Yes, that is the idea - to allow users to set it and align with external LBs. In both case you just set spec.Address.

(.Status.Address is set to the assigned address, .Spec.Address is user intent)

Yes, but if the user somehow manages the assignment - they would set Status.Address too ( gateway controller may not even have a way to find it - for example an external separate 'front' DNS and LB )

If the user is prepared to manage the complexity of using the status subresource to update the .status.address fields, then sure. But I think that will be uncommon.

"User" is likely to mean some tool or CI/CD - or other controllers different from the gateway controller.
For example based on namespace and permissions something may provision a DNS name, get ACME certs, setup some infra - and populate the status.address.

Unfortunately K8S is very backwards in using IP addresses directly - using domain names with shared IPs for http has been the practice for many years now, I don't know any other modern system still having users deal with IP addresses.

The address can be a domain name. IMO most implementations want to manage status and not defer certain fields to another controller/tool. For the use case above, one option is for the tool to create a custom resource that represents the external addresses/names. This custom resource would be a local object reference of gateway.spec.addresses[]. When the status of this custom resource is Ready=true, the Gateway controller sets status.addresses.

In my comment, I was talking about the case where a user manually creates a Service object for some reason (which would be pretty weird imo).

I think in the case of Gateway, the reason to use IP addresses is to allow for other things (like external-dns) to handle the name management for you. Because Gateway deals in vhost names, not having an IP address would be pretty weird here too.

Both Gateway API implementations that I've worked on, that use Loadbalancer Services for the actual data plane traffic path creation, have just picked up the Loadbalancer Service details, lightly translated them, and then set the status manually. Which means that on AWS, if you use an ELB or ALB, you'll get a hostname (as @danehans reminds us) instead of an IP.

site-src/geps/gep-1762.md

costinm · 2023-03-03T17:20:39Z

site-src/geps/gep-1762.md

+
+#### Arbitrary Customization
+
+Currently, to provide arbitrary customizations, Gateway API provides a few mechanisms:


Not sure if 'arbitrary customization' should be in scope of the Gateway API.

Since we support 'external' gateways - each implementation may have its own APIs and configs - in many cases outside of K8S. For in-cluster - if we allow users to create their own deployment and Service - they already have all they need to do arbitrary customization.

I don't see a middle case where more arbitrary customizations would be needed - nor the use case for gateway API to prescribe how implementations handle their custom configs.

Example use case: I want to set CPU requests to "2 CPUs". How can I do that?

If I create a deployment before I deploy the Gateway, I don't know what to put in there - there are 100s of fields like image, etc that I want the controller to apply for me.

If I patch the deployment afterwards, this has a few issues:

Controller cannot do sophisticated merging logic. It cannot say "I want to default 1 CPU unless user has set it"; it can either force ownership or never set it, afaik

We have some period of time with the wrong settings. For CPu its not so bad (besides needing to redeploy for no reason), but what if the user's customization was "do not run as root" for example

Both approaches are also tricky since they rely on ordering between actions, which is not very gitops/declarative friendly.

I understand - my point was that it may be a broader problem ( anything that creates resources has it ), and may be handled by each implementation.

For example in Istio we had a helm chart to install gateway without injection or controler involvment. Others may still do this, and it takes care of all options including image.

Having some template is also common - Deployment, Knative, etc have spec with a lot of options. Just not sure if we want to mix this broad problem into this specification. Would be a good thing to solve, but in a separate context ( and as for other proposals - after some survey on how different vendors deploy in cluster gateways ).

site-src/geps/gep-1762.md

howardjohn · 2023-09-18T14:58:35Z

@robscott GatewayClass dropped

robscott

Thanks @howardjohn!

robscott · 2023-09-18T21:00:35Z

geps/gep-1762.md

+## Goals
+
+* Provide prescriptive guidance for how in-cluster implementations should behave.
+* Provide requirements (tested by conformance) for how in-cluster implementations should behave.


Suggested change

* Provide requirements (tested by conformance) for how in-cluster implementations should behave.

* Provide requirements for how in-cluster implementations should behave.

geps/gep-1762.md

shaneutt

/approve
/lgtm

k8s-ci-robot · 2023-09-19T19:26:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: howardjohn, keithmattix, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~geps/OWNERS~~ [shaneutt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

arkodg · 2023-09-19T19:58:23Z

geps/gep-1762.md

+This section just clarifies and existing part of the spec, how to handle `.spec.addresses` for in-cluster implementations.
+Like all other Gateway types, this should impact the address the `Gateway` is reachable at.
+
+For implementations using a `Service`, this means the `clusterIP` or `loadBalancerIP` (depending on the `Service` type).


ExternalIP may also be valid here for NodePort case

Agreed. Don't want to block this merging but would be great to cover this in a follow up.

robscott · 2023-09-27T18:58:27Z

Thanks @howardjohn!

/hold cancel
/lgtm

This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>

[ upstream commit 5e6e4af ] This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/gep PRs related to Gateway Enhancement Proposal(GEP) labels Feb 23, 2023

k8s-ci-robot requested a review from keithmattix February 23, 2023 22:42

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 23, 2023

k8s-ci-robot requested a review from shaneutt February 23, 2023 22:42

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 23, 2023

howardjohn changed the title ~~wip GEP~~ GEP-1757: In Cluster Gateway Deployments Feb 24, 2023

robscott reviewed Feb 24, 2023

View reviewed changes

howardjohn mentioned this pull request Feb 27, 2023

GEP: In Cluster Gateway Deployments #1762

Closed

howardjohn changed the title ~~GEP-1757: In Cluster Gateway Deployments~~ GEP-1762: In Cluster Gateway Deployments Feb 27, 2023

howardjohn mentioned this pull request Feb 27, 2023

GEP-1651: Gateway Routability #1653

Merged

youngnick mentioned this pull request Feb 28, 2023

GEP: Standard Mechanism to Merge Multiple Gateways #1713

Open

howardjohn marked this pull request as ready for review March 2, 2023 00:34

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2023

k8s-ci-robot requested review from robscott and youngnick March 2, 2023 00:34

howardjohn commented Mar 2, 2023

View reviewed changes

sunjayBhatia reviewed Mar 2, 2023

View reviewed changes

youngnick reviewed Mar 3, 2023

View reviewed changes