Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BackendTLSPolicy's Service attachment is problematic #3554

Open
howardjohn opened this issue Jan 16, 2025 · 16 comments
Open

BackendTLSPolicy's Service attachment is problematic #3554

howardjohn opened this issue Jan 16, 2025 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@howardjohn
Copy link
Contributor

These comments were made during the initial design, but I wanted to persist them into an issue.

BackendTLSPolicy currently attaches to a Service.

In the simple case, this works fine. For example, I may have:

Gateway:
  port: 443
  protocol: HTTPS
  tls: TERMINATE
HTTPRoute:
  backendRef: foo, port 443
BackendTLSPolicy:
  targetRef: Service foo

and I will terminate TLS and encrypt it on the upstream. This works great.


Where things fall apart is when we have multiple paths.

For example, imagine I have the above config, but then also have a service mesh. An application in the service mesh calls curl https://foo.

The foo service has the TLS config saying "Add TLS to this request". The service mesh adds TLS and now it has 2 layers of TLS which is clearly broken.

If the API semantic is supposed to be "make sure this request is TLS" not "Add TLS to the this request" (which I don't think it is, nor do I think it should be, just covering the bases here) that is infeasible to implement since it would require any implementation to know whether a request is already TLS encrypted or not.

This same issue can occur without mesh as well. For example, if in addition to my configs above I have:

Gateway:
  port: 443
  protocol: HTTPS
  tls: PASSTHROUGH
TLSRoute:
  backendRef: foo, port 443

then I have the same problem of TLS being added again. Again, we could say the impl should detect this is already TLS and it should not be added. However, this falls apart in other cases:

Gateway:
  port: 443
  protocol: TCP
TCPRoute:
  backendRef: foo, port 443

As it this is opaque TCP, we are not aware if it is TLS or not.


Overall, these issues prevent BackendTLSPolicy from being a viable replacement for DestinationRule for Istio

@howardjohn howardjohn added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 16, 2025
@youngnick
Copy link
Contributor

Thanks for writing this down, @howardjohn, it does make sense.

Do you have any suggestions here? How is DestinationRule different that it doesn't have these problems? Is it just that it is effectively at the equivalent of the HTTPRoute level?

@howardjohn
Copy link
Contributor Author

DestinationRule actually sort of has the same problem but has an escape hatch.

You can provide the TLS config in the top level to make it apply to the service generally, which has the possibility of the same issue, but you can also scope it to a subset and then route directly to that:

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: originate-tls
spec:
  host: foo
  subsets:
  - name: originate-tls
    tls:
      mode: SIMPLE
      sni: foo.com
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: originate-tls
spec:
  hosts:
  - foo.com
  http:
  - route:
    - destination:
        host: foo
        subset: originate-tls

Note: "subset" is usually used to filter a service into a smaller set; here its used in a less common way where the set of endpoints is all of them, and we just customizing the configuration.

One way you could actually do the same today is to make two services (foo, and foo-originate-tls). Then you can attached the BackendTLSPolicy only to foo-originate-tls and route to foo-originate-tls from the Gateway's we want to originate TLS; others can call foo. A bit tedious for the user, though.

Another way would be to allow the BackendTLSPolicy to be scoped to a specific route in some way, rather than the Service directly. Like maybe:

backendRefs:
  filters: # "filter" name is awkward here...
  - extensionRef:
      kind: BackendTlsPolicy
      name: foo-policy
  name: foo

but then we have the same policy be used as an extensionRef and targetRef which is probably not good.

You could targetRef the HTTPRoute (and further scope with sectionName?), but its still not quite the same since you cannot say "attach it to this backend in this rule in this route"

@mikemorris
Copy link
Contributor

mikemorris commented Feb 1, 2025

You could targetRef the HTTPRoute (and further scope with sectionName?), but its still not quite the same since you cannot say "attach it to this backend in this rule in this route"

This is pretty close to what I was trying to achieve with a from clause in #3573 (comment) - adding a secondary targeting mechanism to a policy, so it could still target a Service, but allow scoping an effective policy down to only apply to requests from a particular GatewayClass (or Mesh?), Gateway, Listener, HTTPRoute or even a named HTTPRoute rule.

@candita
Copy link
Contributor

candita commented Feb 3, 2025

@howardjohn I'm not sure I understand the problem here.

The foo service has the TLS config saying "Add TLS to this request". The service mesh adds TLS and now it has 2 layers of TLS which is clearly broken.

Does the service mesh also add TLS to the request that targets a route with edge termination? If not, can't you use that same treatment to the backend target?

If not, is it possible to reuse SectionName within the spec.targetRef[0].LocalPolicyTargetReferenceWithSectionName to target a specific port that non-service-mesh requests could use?

@howardjohn
Copy link
Contributor Author

yes the mesh would have to use it since it's bound to the service - so anything calling the service should use it. There is nothing saying "X gateway should use it".

Using an alternate port is a very rough UX for users to expose an additional port on their application solely to work around gateway API limits

@candita
Copy link
Contributor

candita commented Feb 3, 2025

Isn't it already clear that we don't want to use BackendTLSPolicy for services in a mesh, since they are generally automatically configured for mTLS? Could we go a step further and say we don't serve egress from a service mesh via BackendTLSPolicy? Or once the traffic comes off the mesh, don't use the mTLS (or other tls config that was pertinent within the mesh)?

@howardjohn
Copy link
Contributor Author

No, the two have zero correlation. Whether any application uses HTTPS is a property of the application. It's rare that users modify their applications to service mesh (or they can never, as they are third party). It's extremely common to have application TLS over mesh TLS

@youngnick
Copy link
Contributor

If the API semantic is supposed to be "make sure this request is TLS" not "Add TLS to the this request" (which I don't think it is, nor do I think it should be, just covering the bases here) that is infeasible to implement since it would require any implementation to know whether a request is already TLS encrypted or not.

Coming back to this, I think the problem is here.

The API semantic I thought we were implying here is "The app that's running inside the containers inside the Pods this service is referencing expects TLS connections to arrive at the process running inside the Pod". So whether or not the mesh does transparent encryption is not important, as the whole intent is allow the application owner to say "the thing that's running in my pods expects TLS on this port".

It's not a request to ensure that the traffic is protected with TLS as a way for an application owner to inform the Gateway implementation that non-TLS requests won't succeed.

@howardjohn
Copy link
Contributor Author

The API semantic I thought we were implying here is "The app that's running inside the containers inside the Pods this service is referencing expects TLS connections to arrive at the process running inside the Pod". So whether or not the mesh does transparent encryption is not important, as the whole intent is allow the application owner to say "the thing that's running in my pods expects TLS on this port".

In theory I think its a good a good semantic and aligns well with the personas. But that doesn't really solve the issues raised here -- that this cannot actually be properly implemented.

For example, you might try to derive "Gateway is TLS PASSTHROUGH, so its already TLS, so I can ignore BackendTLSPolicy". This isn't quite right -- how do we know the TLS is the right TLS? Maybe its not sending the correct SNI, for instance (probably nothing we can do here, though). Maybe the app actually expects nested TLS. Maybe there is a proxy along the path adding TLS. Maybe there is a proxy along the path terminating TLS.

@youngnick
Copy link
Contributor

I agree that those problems are not solved by this, but I also don't really see how they are solvable at all tbh. I understand you have a solution with DestinationRule in Istio, so if you can come up with something that will work better here, I'd love to hear it.

@howardjohn
Copy link
Contributor Author

howardjohn commented Feb 19, 2025

Conceptually the solution is simple - allow the client or chose to chose the TLS policy not (just) the server. @mikemorris is dealing with the same issues with retry budgets.

How we want to expose that in the API is debatable - lots of plausible options. Istio does it by allowing you to specify it in the route backend ref (more or less)

@youngnick
Copy link
Contributor

So, the idea here is that if you specify some TLS policy in the backendRef then it will take precedence? Practically, I don't see how that will work in cases where, for example, the app owner says "my app requires TLS on this port" and then the backendRef says something different. Seems like it will always need to be a negotiation or a last-common-denominator style calculation, which may mean that TLS connections can't proceed - like if there's some policy that says "only TLS 1.3", and the BackendTLSPolicy says "only use TLS 1.1".

@howardjohn
Copy link
Contributor Author

You probably also need to be able to declare the lack of a TLS policy. Like if I'm in a tlsroute doing passthrough that is probably what I want .

In Istio how it works is the "BackendPolicy" type (destination rule) is really a combination of targeting a Service but also applying to specific clients. This has (like GAMMA routes) consumer-specific revisions.

On top of these, each policy can have "subsets" (this is a terrible name, for the purpose here pretend they were named "variants") that can be routed to. So I can have a DR with variant "TLS" and "plaintext" (adding TLS vs not adding it),and then explicitly select a variant from a route.

You don't necessarily need both of those to achieve the goal though

@youngnick
Copy link
Contributor

It feels like using a TLSRoute with Passthrough should imply that routing to a Service that also has a BackendTLSPolicy can probably ignore that BackendTLSPolicy by default, since the BackendTLSPolicy is saying "connections here must be TLS", and so is the TLS Passthrough...

I understand and can see the value of the variant config for more complex use cases, but I think that it's better to leave that to workarounds like separate Services or implementationSpecific things in upstream.

@howardjohn
Copy link
Contributor Author

FWIW this problem is going to come up with every single "backend" policy. And it seems 2025 is the year of backend policies - TLS, load balancing, retry budgets, and I am sure more to come.

@mikemorris
Copy link
Contributor

mikemorris commented Feb 19, 2025

It feels like using a TLSRoute with Passthrough should imply that routing to a Service that also has a BackendTLSPolicy can probably ignore that BackendTLSPolicy by default, since the BackendTLSPolicy is saying "connections here must be TLS", and so is the TLS Passthrough...

I think the phrasing here is getting mixed up again a bit, and you were more precise earlier @youngnick with:

It's not a request to ensure that the traffic is protected with TLS as a way for an application owner to inform the Gateway implementation that non-TLS requests won't succeed.

I do agree that a BackendTLSPolicy is likely not applicable for Passthrough connections though, because in those cases the client (which may be external to the cluster and unaware of BackendTLSPolicy configuration) is responsible for configuring TLS settings as needed to reach the application rather than a Gateway.

@shaneutt shaneutt added the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants