Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE gateway #142

Open
aibangjuxin opened this issue May 15, 2024 · 6 comments
Open

GKE gateway #142

aibangjuxin opened this issue May 15, 2024 · 6 comments

Comments

@aibangjuxin
Copy link
Owner

参考下面两个文章,我部署了一个gateway https 的
Gateway and httproute
而且我后面的 svc 和deployment 状态也是正常的?
但是我的测试请求返回结果是503
我登录后台查看backend service 状态不正常
那么可能的原因有哪些?我应该如何来排查这些问题?
https://cloud.google.com/kubernetes-engine/docs/concepts/gateway-api

https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways

You've deployed a Gateway and an HTTPRoute, and the backend SVC and Deployment are running normally, but you're getting a 503 error when testing the request. And when you log in to the backend, you see that the backend Service is not healthy. 😞

Here are some possible reasons for this issue:

  1. Incorrect configuration: Double-check your Gateway and HTTPRoute configurations to ensure they are correctly pointing to the backend Service. Verify that the spec.backendRefs field in the HTTPRoute is correctly referencing the backend Service.
  2. Backend Service not exposed: Ensure that the backend Service is exposed and listening on the correct port. Check the Service's spec.ports field to ensure it's exposing the correct port.
  3. Backend Pod not running: Although you mentioned the Deployment is running normally, it's possible that the Pods are not running or are not healthy. Check the Pod's status and logs to ensure they're running correctly.
  4. Network Policies: If you have Network Policies configured, they might be blocking traffic between the Gateway and the backend Service. Check if any Network Policies are applied to the backend Service or the Gateway.
  5. Firewall rules: Ensure that there are no firewall rules blocking traffic between the Gateway and the backend Service.
  6. TLS configuration: If you're using TLS, ensure that the TLS certificates are correctly configured and the backend Service is using the correct certificate.
  7. Health checks: Verify that the health checks for the backend Service are correctly configured and passing. If the health checks are failing, the Gateway might not be able to route traffic to the backend Service.
  8. Resource constraints: If the backend Service is experiencing high resource utilization, it might not be able to handle incoming requests, resulting in a 503 error. Check the resource utilization of the backend Service and ensure it has sufficient resources.

To troubleshoot this issue, you can try the following steps:

  1. Check the Gateway and HTTPRoute logs: Use kubectl logs to check the logs of the Gateway and HTTPRoute to see if there are any error messages indicating the cause of the 503 error.
  2. Verify the backend Service status: Use kubectl describe svc <backend-service-name> to check the status of the backend Service and ensure it's running correctly.
  3. Check the backend Pod logs: Use kubectl logs to check the logs of the backend Pod to see if there are any error messages indicating the cause of the issue.
  4. Use a debugging tool: Tools like kubectl debug or curl can help you debug the issue by testing the request flow from the Gateway to the backend Service.
  5. Check the Network Policies and firewall rules: Verify that there are no Network Policies or firewall rules blocking traffic between the Gateway and the backend Service.

By following these steps, you should be able to identify and resolve the issue causing the 503 error. 😊

By default, Kubernetes assumes that the backend Service is listening on port 80 for HTTP traffic, and therefore, it uses port 80 for health checks. This is why you're seeing health checks failing with a 404 error on port 80, even though your backend Service is not configured to listen on port 80.

To fix this, you need to configure the health checks to use the correct port that your backend Service is listening on. You can do this by specifying the healthCheck field in the Service manifest.

For example, if your backend Service is listening on port 8080, you can configure the health checks as follows:

apiVersion: v1
kind: Service
metadata:
  name: my-backend-service
spec:
  selector:
    app: my-backend-app
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  healthCheck:
    httpGet:
      path: /healthz
      port: 8080

In this example, the healthCheck field specifies that the health checks should be performed using an HTTP GET request to the /healthz path on port 8080.

By configuring the health checks to use the correct port, you should be able to resolve the issue with the health checks failing on port 80.

Additionally, you can also specify the healthCheck field in the Deployment manifest, which will apply to all Pods created by the Deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-backend-deployment
spec:
  selector:
    matchLabels:
      app: my-backend-app
  template:
    metadata:
      labels:
        app: my-backend-app
    spec:
      containers:
      - name: my-backend-container
        image: my-backend-image
        ports:
        - containerPort: 8080
      healthCheck:
        httpGet:
          path: /healthz
          port: 8080

By configuring the health checks correctly, you should be able to resolve the issue with the Gateway not being able to route traffic to the backend Service due to failed health checks.

You are correct that the healthCheck field is not applicable to a Service resource, but rather to a Pod or Deployment resource.

In the context of a Gateway and HTTPRoute, the health checks are performed on the backend Service resource, not on the individual Pods or Deployments.

When you create a Service resource, Kubernetes automatically configures health checks for the Service using the readinessProbe and livenessProbe mechanisms. These probes are used to determine the health of the Pods backing the Service.

By default, Kubernetes assumes that the Service is listening on port 80 for HTTP traffic, and therefore, it uses port 80 for health checks. If your Service is not listening on port 80, you can specify the correct port using the port field in the Service manifest.

For example:

apiVersion: v1
kind: Service
metadata:
  name: my-backend-service
spec:
  selector:
    app: my-backend-app
  ports:
  - name: http
    port: 8080
    targetPort: 8080

In this example, the Service is configured to listen on port 8080, and health checks will be performed on this port.

If you are still experiencing issues with health checks, I would recommend checking the Service logs and the Pod logs to determine why the health checks are failing.

Please let me know if you have any further questions or concerns!

根据您提供的信息和参考文档,可能的原因和排查步骤如下:

可能的原因:

  1. 后端服务(svc和deployment)虽然处于运行状态,但可能由于某些原因无法正常响应请求,例如资源不足、配置错误等。

  2. Gateway和HttpRoute配置可能存在错误,导致流量无法正确路由到后端服务。

  3. 防火墙、网络或者集群网络策略设置可能阻止了流量访问。

  4. 后端服务的就绪(Readiness)探针或存活(Liveness)探针配置可能有问题,导致服务被视为不可用。

排查步骤:

  1. 检查后端服务的日志,查看是否有错误信息或异常。

  2. 检查Gateway和HttpRoute的配置,确保它们正确指向后端服务。查看日志中是否有相关错误信息。

  3. 检查集群网络策略、防火墙规则等,确保入口流量能够到达Gateway。

  4. 检查后端服务的就绪和存活探针配置是否正确。

  5. 尝试从集群内的其他Pod访问后端服务,排除网络问题。

  6. 查看Gateway的指标和日志,了解请求是否到达Gateway,以及Gateway是否将请求转发给后端服务。

  7. 如果问题无法解决,可以收集相关日志和配置,并向Google Cloud支持团队寻求帮助。

总的来说,需要逐步排查Gateway、HttpRoute、后端服务、网络等各个环节,确定问题所在,并根据具体情况进行调整和修复。​​​​​​​​​​​​​​​​

好的,根据您提供的额外信息"返回信息里面有一个Google头",说明请求确实已经到达了Gateway。并且"503 no upstream"的错误通常意味着Gateway无法将请求转发到上游后端服务。

在这种情况下,我们可以重点排查以下几个方面:

  1. 后端服务的可用性

    • 检查后端服务Pod的状态,是否有足够的Ready实例
    • 检查服务日志,是否有异常错误
    • 尝试直接访问后端服务(如Service或Pod IP),确认服务本身是否在正常运行
  2. Gateway和HttpRoute配置

    • 检查HttpRoute资源中指定的后端服务是否正确
    • 检查后端服务Endpoints是否正确
    • 查看Gateway及HttpRoute的Status字段,了解配置是否生效
  3. Gateway与后端服务的网络连通性

    • 确保Gateway所在的命名空间能够与后端服务所在命名空间通信(网络策略允许)
    • 检查Gateway和后端服务是否使用相同的VPC网络(如有多个VPC)
  4. Gateway指标和日志

    • 查看Gateway指标,了解请求是否到达Gateway
    • 检查Gateway日志,查看是否有错误信息

通过排查上述几个方面,应该可以找到503错误的具体原因,解决Gateway到达后端服务的连通性问题。如果无法自行解决,可以将相关日志和配置提供给GCP支持团队协助分析。

@aibangjuxin
Copy link
Owner Author

I think I have another idea
I deploy using 80 make sure all testing finished
I must make sure all my configure is right

@aibangjuxin
Copy link
Owner Author

当然可以。以下是一个使用HTTPS的HTTPRoute配置示例,其中后端服务名称为svc-test

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: http-route-example
spec:
  parentRefs:
    - name: your-gateway
  hostnames:
    - "example.com"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: "/"
      backendRefs:
        - name: svc-test
          port: 443

在这个例子中,HTTPRoute资源被命名为http-route-example,并且它引用了一个名为your-gateway的父资源。hostnames字段定义了路由将匹配的主机名,而rules部分定义了路由规则。在matches中,我们指定了一个路径前缀"/",这意味着所有到example.com的HTTP请求都将被路由。backendRefs部分指定了后端服务的名称svc-test和服务的端口443(HTTPS的标准端口)。

请确保您的Gateway已经配置了相应的监听器来处理HTTPS请求,并且您的svc-test服务已经准备好接收HTTPS流量。此外,您可能需要配置适当的TLS证书。

希望这个例子能帮助您配置您的HTTPRoute。如果您有任何疑问或需要进一步的帮助,请随时告诉我!¹

Source: Conversation with Bing, 5/15/2024
(1) HTTPRoute - AWS Gateway API Controller. https://www.gateway-api-controller.eks.aws.dev/api-types/http-route/.
(2) Configuring Gateway API resources across namespaces. https://docs.konghq.com/kubernetes-ingress-controller/latest/guides/services/cross-namespace/.
(3) How can I combine the WCF services config for both http and https in .... https://stackoverflow.com/questions/4346926/how-can-i-combine-the-wcf-services-config-for-both-http-and-https-in-one-web-con.
(4) Chapter 15. Configuring Routes - Red Hat Customer Portal. https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html/networking/configuring-routes.
(5) Route configuration - Configuring Routes | Networking | OpenShift .... https://docs.openshift.com/container-platform/4.9/networking/routes/route-configuration.html.
(6) undefined. https://docs.konghq.com/assets/kubernetes-ingress-controller/examples/echo-service.yaml.

好的,根据您之前提供的HTTPRoute配置示例,这里有一个对应的Gateway配置文件示例,用于处理HTTPS请求:

apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: example-gateway-class
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRef:
          kind: Secret
          name: example-secret
  routes:
    kind: HTTPRoute
    selector:
      matchLabels:
        gateway: example-gateway

在这个配置中:

  • apiVersion 标识了使用的API版本。
  • kind 指定了资源类型为Gateway
  • metadata 下的name 是您的Gateway的名称。
  • spec 下的gatewayClassName 是您定义的Gateway类的名称。
  • listeners 部分定义了Gateway的监听器,这里我们设置了一个名为https的监听器,它使用HTTPS协议在端口443上监听。
  • tls 部分定义了TLS配置,mode: Terminate 表示Gateway将终止TLS连接,certificateRef 引用了一个包含TLS证书的Secret

请确保您已经创建了一个名为example-secretSecret资源,其中包含了您的TLS证书和私钥。同时,确保您的HTTPRoute资源的标签与Gateway的routes.selector.matchLabels相匹配。

这只是一个基本的示例,您可能需要根据您的具体需求调整配置。如果您需要更多帮助,请随时告诉我!³

Source: Conversation with Bing, 5/15/2024
(1) TLS and SSL :: Spring Cloud Gateway. https://docs.spring.io/spring-cloud-gateway/reference/spring-cloud-gateway/tls-and-ssl.html.
(2) Getting Started | Building a Gateway - Spring. https://spring.io/guides/gs/gateway/.
(3) HTTPS using Self-Signed Certificate in Spring Boot | Baeldung. https://www.baeldung.com/spring-boot-https-self-signed-certificate.
(4) Azure Application Gateway HTTP settings configuration. https://learn.microsoft.com/en-us/azure/application-gateway/configuration-http-settings.
(5) kubernetes - how to configure ingress to direct traffic to an https .... https://stackoverflow.com/questions/54459015/how-to-configure-ingress-to-direct-traffic-to-an-https-backend-using-https.
(6) undefined. https://github.com/spring-guides/gs-gateway.git.
(7) undefined. https://start.spring.io.
(8) undefined. https://httpbin.org/get.
(9) undefined. http://httpbin.org:80.

抱歉造成了混淆。确实,我在之前的Service配置中没有包含type: LoadBalancer的设置。如果您希望您的Service通过外部负载均衡器暴露,您需要在Service配置中添加type: LoadBalancer。以下是更新后的Service配置,包括了负载均衡器的类型设置:

apiVersion: v1
kind: Service
metadata:
  name: svc-test
spec:
  type: LoadBalancer
  selector:
    app: your-app-name # 确保这里的标签与您的Deployment中的Pod标签匹配
  ports:
    - protocol: TCP
      port: 443       # Service对外暴露的端口
      targetPort: 8443 # Pod内部的端口

在这个配置中,type: LoadBalancer 指示Kubernetes为该Service配置一个外部负载均衡器。当流量到达负载均衡器时,它会被转发到Service的443端口,然后Service将流量转发到Pod的8443端口。

请确保您的云提供商支持负载均衡器,并且您已经按照提供商的指南来配置了所需的资源和权限。更多关于Kubernetes Service的负载均衡器类型配置,您可以参考官方文档¹。

如果您有任何疑问或需要进一步的帮助,请随时告诉我!

Source: Conversation with Bing, 5/15/2024
(1) Service | Kubernetes. https://kubernetes.io/docs/concepts/services-networking/service/.
(2) Create an External Load Balancer | Kubernetes. https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/.
(3) ClusterIP, NodePort, and LoadBalancer: Kubernetes Service Types - Baeldung. https://bing.com/search?q=Kubernetes+Service+loadBalancer+type+configuration.
(4) kubernetes - kubectl - How to edit service spec type to LoadBalancer .... https://stackoverflow.com/questions/51559159/kubectl-how-to-edit-service-spec-type-to-loadbalancer-via-command-line.
(5) ClusterIP, NodePort, and LoadBalancer: Kubernetes Service Types - Baeldung. https://www.baeldung.com/ops/kubernetes-service-types.
(6) en.wikipedia.org. https://en.wikipedia.org/wiki/Kubernetes.

在Kubernetes中,Gateway API允许您定义如何将流量路由到您的服务。根据您提供的两个页面,Gateway API不会强制要求特定的负载均衡器类型,但通常会根据您的部署环境和需求来选择。例如,如果您需要从集群外部访问服务,通常会使用LoadBalancer类型,它会自动创建一个外部负载均衡器。如果您的服务只在集群内部通信,可能会使用ClusterIP类型。

在Google Kubernetes Engine (GKE) 中部署Gateway时,通常会使用LoadBalancer类型来允许外部流量访问您的服务。这是因为LoadBalancer会为您的服务提供一个外部可访问的IP地址,并将流量分发到服务中的Pods¹²。

如果您的服务需要从集群外部访问,并且您希望使用Gateway API,那么您可能需要在Service配置中指定type: LoadBalancer。这样,当流量到达负载均衡器时,它会被转发到Service的端口,然后Service将流量转发到Pod的目标端口。

请根据您的具体需求和部署环境来选择合适的负载均衡器类型。如果您需要更多帮助或有其他问题,请随时告诉我!

Source: Conversation with Bing, 5/15/2024
(1) Gateway API | Kubernetes. https://kubernetes.io/docs/concepts/services-networking/gateway/.
(2) Deploying Gateways | Google Kubernetes Engine (GKE ... - Google Cloud. https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways.
(3) Kubernetes Service Types: LoadBalancer, NodePort, and ClusterIP. https://blog.devops.dev/kubernetes-service-types-loadbalancer-nodeport-and-clusterip-9fb327e443aa.
(4) Create an External Load Balancer | Kubernetes. https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/.
(5) Ingress vs. Load Balancer in Kubernetes | Baeldung on Ops. https://www.baeldung.com/ops/kubernetes-ingress-vs-load-balancer.
在这里针对我的环境有这样一个问题
是不是我配置的暴露类型不对
但是好像也不是?因为我最终的是通过gateway 来提供队员服务的

@aibangjuxin
Copy link
Owner Author

当然可以。以下是一个使用Mermaid语法描述的访问流程图,它展示了客户端通过Gateway和HTTPRoute访问后端服务svc-test的过程:

graph LR
  client[客户端] -- HTTPS请求 --> |443端口| gateway[Gateway]
  gateway -- 路由请求 --> httproute[HTTPRoute]
  httproute -- 转发请求 --> svc[Service: svc-test]
  svc -- 代理请求 --> pod[Pod: 8443端口]
Loading

在这个流程中:

  • 客户端发起一个HTTPS请求到Gateway。
  • Gateway接收到请求,并根据HTTPRoute的配置将请求路由到相应的服务。
  • HTTPRoute根据配置将请求转发到名为svc-test的Service。
  • Service将请求代理到后端的Pod,该Pod在8443端口监听。

请确保您的Gateway和HTTPRoute配置与您的Service和Deployment设置相匹配,以确保流量能够正确地流经整个路径。

如果您需要进一步的帮助或有任何问题,请随时告诉我!

@aibangjuxin
Copy link
Owner Author

如果您在部署了 Gateway 和 HTTPRoute 后遇到 503 错误,并且发现后台的 backend service 状态不正常,这通常意味着请求没有成功地到达预期的后端服务。以下是一些可能导致此问题的原因,以及相应的排查步骤:

  1. Service 和 Endpoints

    • 确认 Service 是否正确配置,并且有有效的 Endpoints。如果 Endpoints 为空,说明没有 Pod 与之关联。
  2. Pod 状态

    • 检查 Pod 是否正在运行并且状态为 RunningReady。如果 Pod 状态不是 Running 或者没有 Ready,则请求无法到达。
  3. Deployment 配置

    • 确认 Deployment 的配置是否正确,包括但不限于容器镜像、端口、健康检查(Liveness 和 Readiness Probes)等。
  4. 网络策略

    • 检查是否有任何网络策略阻止了 Gateway 或 Service 之间的通信。
  5. DNS 解析问题

    • 确认 Gateway 和 Service 的 DNS 名称是否正确解析。
  6. 资源限制

    • 检查是否有足够的资源(CPU、内存)分配给 Pod,以及是否设置了合适的资源请求和限制。
  7. 日志检查

    • 查看 Gateway 和 backend Pod 的日志,以确定是否有错误信息或警告。
  8. Service 的端口

    • 确认 Service 监听的端口与 HTTPRoute 或 Gateway 配置中指定的端口相匹配。
  9. Gateway 的配置

    • 检查 Gateway 配置是否正确,包括监听端口、路由规则等。
  10. HTTPRoute 的配置

    • 确认 HTTPRoute 的配置是否正确,包括匹配规则、BackendRefs 等。
  11. 证书问题

    • 如果是 HTTPS 流量,检查证书是否正确配置,并且没有过期。
  12. 健康检查失败

    • 如果 Pod 没有通过健康检查,它将不会被 Gateway 视为就绪状态。
  13. 服务发现问题

    • 确认 Kubernetes 的服务发现机制是否正常工作。
  14. 负载均衡器问题

    • 如果使用外部负载均衡器,检查其配置和状态。
  15. 防火墙或安全组设置

    • 确认没有防火墙或安全组规则阻止了流量。

排查步骤:

  • 使用 kubectl describe 查看 Gateway 和 Service 的详细配置和当前状态。
  • 使用 kubectl get pods 查看 Pod 的状态。
  • 使用 kubectl logs <pod-name> 查看 Pod 的日志。
  • 使用 kubectl exec -it <pod-name> -- /bin/sh 进入 Pod 内部进行进一步的调试。
  • 使用 kubectl describe httproute <httproute-name> 查看 HTTPRoute 的详细信息。

根据您提供的链接,您可能已经了解了 Gateway API 和如何在 Google Kubernetes Engine (GKE) 上部署 Gateways 的基本概念和步骤。如果问题依然存在,您可能需要根据上述排查步骤进行更深入的调查,或者查看 GKE 的官方文档和社区支持以获取更多帮助。

@aibangjuxin
Copy link
Owner Author

@aibangjuxin
Copy link
Owner Author

GKE Gateway does not support Wildcards, regular expressions, and dynamic URLs.

kubectl get gatewayclass
And verify log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant