-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-1762: In Cluster Gateway Deployments #1757
Conversation
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @howardjohn!
site-src/geps/gep-1762.md
Outdated
This MUST be derived from the referenced `Spec.Address`. | ||
* MUST not deploy any resources into the cluster; it is expected that a user will do these actions. | ||
|
||
### Merging Gateways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this probably needs to be broken out, it applies to all implementation types. However, given the infrastructure
field is shared with the rest of the GEP. I intent to keep it in here for the initial discussions to keep conversation focused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems like this would be right at home in a GEP about infrastructure
by itself, as another use case to justify including the new struct.
site-src/geps/gep-1762.md
Outdated
name: merged-gateway | ||
spec: | ||
infrastructure: | ||
attachTo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this primary Gateway idea is interesting, makes some of the precedence more explicit when merging which is great (relying on the the general conflict resolution guidelines where the resource that comes "first" will still apply in some cases it seems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned with this as a requirement - allowing implementations freedom to merge as needed is important.
A multi-tenant, external gateway implementation may combine all the gateways of type http with a valid domain name. Even a single-in-cluster gateway could do the same. Note that implementations are not bound by namespace or even cluster or project/owner boundaries.
There is a separate problem of 'too many resource to fit in a single yaml' - but creating 2 Gateways will result in 2 IPs and deployments. A simple naming pattern ( same prefix ) or common label may work too, infrastructure.attachTo seems a bit heavy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better - if you add 'attachTo. Service' in the section above, having 2 Gateways attach to the same Service will be a clear signal that they need to be merged ( or an error condition if they must explicitly attach to Gateway ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea behind attachTo
is to allow multiple Gateways to attach to something that provides the address information, which could be another Gateway or a Service (presumably that would be manually configured). I have some concerns, but I think I just need to think through the use case, and on balance think it's pretty reasonable.
site-src/geps/gep-1762.md
Outdated
With any in-cluster deployment, customization requirements will arise. | ||
|
||
Some common requirements would be: | ||
* `Service.spec.type`, to control whether a service is a `ClusterIP` or `LoadBalancer`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in Contour we've made this configurable via GatewayClass params but as you say in other places it is a bit unwieldy, esp. as you add in the desire to configure a particular Service port and NodePort which should be done at the Gateway level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that in the case we add infrastructure
, that any existing GatewayClass paramsRef
config that overlaps would be treated as an implementation-wide default, that could be overridden by the more-specific Gateway setting.
site-src/geps/gep-1762.md
Outdated
|
||
With this configuration, an implementation: | ||
* MUST mark the Gateway as Ready and provide an address in `Status.Addresses` where the Gateway can be reached on each configured port. | ||
* MUST label all generated resources (Service, Deployment, etc) with `gateway.networking.k8s.io/metadata.name: my-gateway` (where `my-gateway` is the name of the Gateway resource). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there already a pattern for 'managed-by' ? I don't think it's unique to gateways to create associated resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is ownerRef but, IMO, its useful to have a label since a lot of things need label selector (HPA, for example). Plus deployment itself needs a label selector to match on Pods, as well as Service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we'll still have ownerRef too ?
I don't mind having a common label - it would also work for the other case in this doc ( merging gateways, since the 'other' gateways are also associated with the same gateway ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the generated resources reside in the same namespace as the Gateway. Should the value be ns/name to support potential use cases for the Gateway and generated resources residing in different namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't cross namespace boundary easily ( i.e. a setting in ns1 shouldn't be able to influence something in ns2 ).
We certainly need to support gateways running outside of the namespace (or cluster) - but with care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that we should not allow for namespace-crossing with the deployment without a very strong use case. As we've seen from the issues that led to ReferenceGrant, allowing cross-namespace references can have large unintended consequences. I think we should leave this as "same namespace as Gateway" unless there's a very strong use case otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to restricting to the same ns as a Gateway. My thinking is to start out permissive, see if any use cases arise, and then become more restrictive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One lesson I've learned doing this for a while now, is it's easier to start restrictive and carve out exceptions if they're necessary - the opposite is suprisingly difficult! That's why I'd prefer to rule out namespace-crossing without a strong use case - we can carve out exceptions later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to have ownerref as well, it is easy to be gced when gateway is deleted
site-src/geps/gep-1762.md
Outdated
``` | ||
|
||
With this configuration, an implementation: | ||
* MUST mark the Gateway as Ready and provide an address in `Status.Addresses` where the Gateway can be reached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the user manually creating the Gateway just populate spec.Address or status.Address ? Would work for external LBs too.
But I'm not sure we should be prescriptive about 'self deployed' or 'external' - better to focus on the 'auto-deployed'. There are many other valid options - could be a DNS name for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the idea - to allow users to set it and align with external LBs. In both case you just set spec.Address.
(.Status.Address is set to the assigned address, .Spec.Address is user intent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if the user somehow manages the assignment - they would set Status.Address too ( gateway controller may not even have a way to find it - for example an external separate 'front' DNS and LB )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user is prepared to manage the complexity of using the status
subresource to update the .status.address
fields, then sure. But I think that will be uncommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"User" is likely to mean some tool or CI/CD - or other controllers different from the gateway controller.
For example based on namespace and permissions something may provision a DNS name, get ACME certs, setup some infra - and populate the status.address.
Unfortunately K8S is very backwards in using IP addresses directly - using domain names with shared IPs for http has been the practice for many years now, I don't know any other modern system still having users deal with IP addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address can be a domain name. IMO most implementations want to manage status and not defer certain fields to another controller/tool. For the use case above, one option is for the tool to create a custom resource that represents the external addresses/names. This custom resource would be a local object reference of gateway.spec.addresses[]
. When the status of this custom resource is Ready=true
, the Gateway controller sets status.addresses
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my comment, I was talking about the case where a user manually creates a Service object for some reason (which would be pretty weird imo).
I think in the case of Gateway, the reason to use IP addresses is to allow for other things (like external-dns) to handle the name management for you. Because Gateway deals in vhost names, not having an IP address would be pretty weird here too.
Both Gateway API implementations that I've worked on, that use Loadbalancer Services for the actual data plane traffic path creation, have just picked up the Loadbalancer Service details, lightly translated them, and then set the status manually. Which means that on AWS, if you use an ELB or ALB, you'll get a hostname (as @danehans reminds us) instead of an IP.
site-src/geps/gep-1762.md
Outdated
|
||
#### Arbitrary Customization | ||
|
||
Currently, to provide arbitrary customizations, Gateway API provides a few mechanisms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if 'arbitrary customization' should be in scope of the Gateway API.
Since we support 'external' gateways - each implementation may have its own APIs and configs - in many cases outside of K8S. For in-cluster - if we allow users to create their own deployment and Service - they already have all they need to do arbitrary customization.
I don't see a middle case where more arbitrary customizations would be needed - nor the use case for gateway API to prescribe how implementations handle their custom configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example use case: I want to set CPU requests to "2 CPUs". How can I do that?
If I create a deployment before I deploy the Gateway, I don't know what to put in there - there are 100s of fields like image, etc that I want the controller to apply for me.
If I patch the deployment afterwards, this has a few issues:
- Controller cannot do sophisticated merging logic. It cannot say "I want to default 1 CPU unless user has set it"; it can either force ownership or never set it, afaik
- We have some period of time with the wrong settings. For CPu its not so bad (besides needing to redeploy for no reason), but what if the user's customization was "do not run as root" for example
Both approaches are also tricky since they rely on ordering between actions, which is not very gitops/declarative friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand - my point was that it may be a broader problem ( anything that creates resources has it ), and may be handled by each implementation.
For example in Istio we had a helm chart to install gateway without injection or controler involvment. Others may still do this, and it takes care of all options including image.
Having some template is also common - Deployment, Knative, etc have spec with a lot of options. Just not sure if we want to mix this broad problem into this specification. Would be a good thing to solve, but in a separate context ( and as for other proposals - after some survey on how different vendors deploy in cluster gateways ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick updates @howardjohn! Just a few more nits.
geps/gep-1762.md
Outdated
## Goals | ||
|
||
* Provide prescriptive guidance for how in-cluster implementations should behave. | ||
* Provide requirements (tested by conformance) for how in-cluster implementations should behave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unfortunately going to be a feature that doesn't/can't have conformance tests.
protocol: HTTP | ||
``` | ||
|
||
This would generate a `Service` with `clusterIP` or `loadBalancerIP`, depending on the Service type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really tell if this is trying to tell implementations what they should do, or just provide an example of how some implementations handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent here got muddled through the revisions. Its meant to be "If they provide an IP, you better use it" and whether you use it as the clusterIP vs the loadBalancerIP is based on the type of service, which was determined by routability previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already covered reasonably well in the spec:
gateway-api/apis/v1beta1/gateway_types.go
Lines 134 to 163 in cd223ea
// Addresses requested for this Gateway. This is optional and behavior can | |
// depend on the implementation. If a value is set in the spec and the | |
// requested address is invalid or unavailable, the implementation MUST | |
// indicate this in the associated entry in GatewayStatus.Addresses. | |
// | |
// The Addresses field represents a request for the address(es) on the | |
// "outside of the Gateway", that traffic bound for this Gateway will use. | |
// This could be the IP address or hostname of an external load balancer or | |
// other networking infrastructure, or some other address that traffic will | |
// be sent to. | |
// | |
// The .listener.hostname field is used to route traffic that has already | |
// arrived at the Gateway to the correct in-cluster destination. | |
// | |
// If no Addresses are specified, the implementation MAY schedule the | |
// Gateway in an implementation-specific manner, assigning an appropriate | |
// set of Addresses. | |
// | |
// The implementation MUST bind all Listeners to every GatewayAddress that | |
// it assigns to the Gateway and add a corresponding entry in | |
// GatewayStatus.Addresses. | |
// | |
// Support: Extended | |
// | |
// +optional | |
// <gateway:validateIPAddress> | |
// +kubebuilder:validation:MaxItems=16 | |
// +kubebuilder:validation:XValidation:message="IPAddress values must be unique",rule="self.all(a1, a1.type == 'IPAddress' ? self.exists_one(a2, a2.type == a1.type && a2.value == a1.value) : true )" | |
// +kubebuilder:validation:XValidation:message="Hostname values must be unique",rule="self.all(a1, a1.type == 'Hostname' ? self.exists_one(a2, a2.type == a1.type && a2.value == a1.value) : true )" | |
Addresses []GatewayAddress `json:"addresses,omitempty"` |
It may be worth describing how in-cluster implementations can accomplish this in this GEP, but I don't think we need to add any new requirements here. Just trying to avoid introducing too many unique sources of requirements in case they need to change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it more clear, I think. LMK what you think
geps/gep-1762.md
Outdated
|
||
This would generate a `Service` with `clusterIP` or `loadBalancerIP`, depending on the Service type. | ||
|
||
This follows the same behavior as out-of-cluster Gateway implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure I see the parallel here? They may be providing similar functionality, but I don't think the way it's accomplished is very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geps/gep-1762.md
Outdated
// If Labels is set on the GatewayClass as well, the labels are merged with the Gateway's labels taking precedence. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youngnick's traveling for a bit so may take some time to respond, but I did get some time to talk with him about this yesterday and he was on board with only targeting Gateway to start.
geps/gep-1762.md
Outdated
// LocalParametersReference identifies an API object containing controller-specific | ||
// configuration resource within the cluster. | ||
type LocalParametersReference struct { | ||
// Group is the group of the referent. | ||
Group Group `json:"group"` | ||
|
||
// Kind is kind of the referent. | ||
Kind Kind `json:"kind"` | ||
|
||
// Name is the name of the referent. | ||
// | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=253 | ||
Name string `json:"name"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd personally rather keep this GEP as focused as possible since we're really only a couple of weeks away from trying to be feature complete for v1.0. I don't really have anything against ParamsRef here, but similar to the comment above, it introduces some form of merging logic between GWC and GW, and I want to make sure we take the time to get it right so would rather leave it out of scope for now.
@robscott GatewayClass dropped |
64295f0
to
1c01615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardjohn!
geps/gep-1762.md
Outdated
## Goals | ||
|
||
* Provide prescriptive guidance for how in-cluster implementations should behave. | ||
* Provide requirements (tested by conformance) for how in-cluster implementations should behave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Provide requirements (tested by conformance) for how in-cluster implementations should behave. | |
* Provide requirements for how in-cluster implementations should behave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: howardjohn, keithmattix, shaneutt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This section just clarifies and existing part of the spec, how to handle `.spec.addresses` for in-cluster implementations. | ||
Like all other Gateway types, this should impact the address the `Gateway` is reachable at. | ||
|
||
For implementations using a `Service`, this means the `clusterIP` or `loadBalancerIP` (depending on the `Service` type). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExternalIP
may also be valid here for NodePort
case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Don't want to block this merging but would be great to cover this in a follow up.
Thanks @howardjohn! /hold cancel |
This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
[ upstream commit 5e6e4af ] This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
[ upstream commit 5e6e4af ] This commit is to add the required label gateway-name e.g. `gateway.networking.k8s.io/gateway-name`, and propagate all labels and annotations from spec.infrastructure in all generated resources. The main goal is to conform with below GEP. Relates: kubernetes-sigs/gateway-api#1757 Signed-off-by: Tam Mach <[email protected]>
Review note: This GEP was split into 3 - the current GEP (focusing on in-cluster deployments), #1868 (infrastructure field we depend on), and https://github.com/kubernetes-sigs/gateway-api/pull/1863/files (gateway merging)
What type of PR is this?
/kind gep
What this PR does / why we need it:
This PR attempts to inject some opinions about how in-cluster deployments should work. Currently, there is a lot of different implementations behaving differently, leading to confusion and drifting user experiences.
As-is, this PR is fairly opinionated. I expect that some of the MUSTs become SHOULDs as we iterate on this PR. In its current state, this is largely meant as a discussion point. I expect there to be a lot of iteration as we learn different implementations requirements and perspectives.
Note: most of the names were just picked arbitrarily. I'd like to first focus on the concepts, then we can debate the best names (almost certainly not what the initial GEP has!)
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: