Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for statusz page for Kubernetes Components #4830

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

richabanker
Copy link
Contributor

  • One-line PR description: Add a KEP for statusz page in Kubernetes components.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 6, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 6, 2024
@richabanker
Copy link
Contributor Author

/assign @logicalhan

@jpbetz
Copy link
Contributor

jpbetz commented Sep 11, 2024

/approve
For PRR. Alpha criteria looks straightforward here.

@richabanker richabanker force-pushed the component-statusz branch 2 times, most recently from b5c46f5 to ab4b0cf Compare September 13, 2024 21:27
@richabanker
Copy link
Contributor Author

/assign @rexagod

@richabanker richabanker force-pushed the component-statusz branch 2 times, most recently from 0eb76a3 to 37aa937 Compare September 26, 2024 22:45
@jpbetz
Copy link
Contributor

jpbetz commented Oct 1, 2024

/approve
For PRR

@richabanker richabanker force-pushed the component-statusz branch 2 times, most recently from e35b408 to 745deac Compare October 8, 2024 23:10
@richabanker richabanker force-pushed the component-statusz branch 2 times, most recently from 4503a4b to 1c5aac3 Compare October 9, 2024 15:53
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024

### Authz and authn

Access to the endpoint will be limited to members of the `system:monitoring group` ([ref](https://github.com/kubernetes/kubernetes/blob/release-1.31/staging/src/k8s.io/apiserver/pkg/authentication/user/user.go#L73)), which is how paths like /healthz, /livez, /readyz are restricted today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for kubelet can you please add details about what the RBAC permissions will be for accessing this endpoint?

system:monitoring does not have the ability to call the kubelet's endpoints AFAIK. What subresource will allow a user access to kubelet's statsz endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was assuming that system:monitoring group would suffice for kubelet as well, i.e. we could add the new endpoints to this set of rules for system:monitoring role.
cc @liggitt for his advice here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this endpoint expect to be programmatically queried? I was assuming no, which makes it feel different than the endpoints that are granted to system:monitoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richabanker - I think it would help to clarify what you will be adding. Can you add a markdown code section of what changes you will make to the system:monitoring ClusterRole.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. Here is how I tested this.

Create a serviceaccount

kubectl create serviceaccount hello --namespace default

Create a binding to system:monitoring ClusterRole

kubectl create clusterrolebinding --clusterrole=system:monitoring --serviceaccount default:hello helloissystemmonitor 

Call the kubelet metrics endpoint via apiserver proxy.

kubectl get --raw /api/v1/nodes/$(kubectl get nodes --no-headers | awk '{print $1}')/proxy/metrics --token $(kubectl create token hello)

Error from server (Forbidden): nodes "gke-test-default-pool-93fcb923-nt6x" is forbidden: User "system:serviceaccount:default:hello" cannot get resource "nodes/proxy" in API group "" at the cluster scope

Now lets try directly accessing the kubelet API of the node

Create a file called pod.yaml with these contents

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-sleeper
spec:
  serviceAccount: hello
  containers:
  - name: ubuntu
    image: ubuntu:latest # Use the latest Ubuntu image
    command: ["/bin/bash", "-c", "apt-get update && apt-get install -y curl jq && sleep infinity"] # Start a bash shell
    env:
    - name: NODE_IP
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP

Create a pod

kubectl apply -f pod.yaml 

Exec into the pod

kubectl exec ubuntu-sleeper -it -- /bin/bash 

Run the following command inside the pod

root@ubuntu-sleeper:/# curl -k https://$NODE_IP:10250/metrics -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/
token)"
Forbidden (user=system:serviceaccount:default:hello, verb=get, resource=nodes, subresource=metrics)

Now lets check the permissions that hello has

kubectl auth can-i --list --as system:serviceaccount:default:hello              ok  test kube 
Warning: the list may be incomplete: webhook authorizer does not support user rule resolution
Resources                                       Non-Resource URLs                      Resource Names   Verbs
selfsubjectreviews.authentication.k8s.io        []                                     []               [create]
selfsubjectaccessreviews.authorization.k8s.io   []                                     []               [create]
selfsubjectrulesreviews.authorization.k8s.io    []                                     []               [create]
                                                [/.well-known/openid-configuration/]   []               [get]
                                                [/.well-known/openid-configuration]    []               [get]
                                                [/api/*]                               []               [get]
                                                [/api]                                 []               [get]
                                                [/apis/*]                              []               [get]
                                                [/apis]                                []               [get]
                                                [/healthz/*]                           []               [get]
                                                [/healthz]                             []               [get]
                                                [/healthz]                             []               [get]
                                                [/healthz]                             []               [get]
                                                [/livez/*]                             []               [get]
                                                [/livez]                               []               [get]
                                                [/livez]                               []               [get]
                                                [/livez]                               []               [get]
                                                [/metrics/slis]                        []               [get]
                                                [/metrics]                             []               [get]
                                                [/openapi/*]                           []               [get]
                                                [/openapi]                             []               [get]
                                                [/openid/v1/jwks/]                     []               [get]
                                                [/openid/v1/jwks]                      []               [get]
                                                [/readyz/*]                            []               [get]
                                                [/readyz]                              []               [get]
                                                [/readyz]                              []               [get]
                                                [/readyz]                              []               [get]
                                                [/version/]                            []               [get]
                                                [/version/]                            []               [get]
                                                [/version]                             []               [get]
                                                [/version]                             []               [get]

notice that it has /metrics permissions but that won't give it permissions for kubelet. For kubelet you will have to extend the rbac to something like nodes/statusz or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a markdown for changes that we can do to system:monitoring role to allow access to kubelet's /statusz. Does this look ok now @vinayakankugoyal ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I am not sure about this.

It seems like system:monitoring did not previously have permissions to call kubelet API do we really want to grant this permission?

Copy link
Contributor Author

@richabanker richabanker Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these changes to system:monitoring as placeholder for now. We can have a final decision on this once @tallclair (sig-node) is available for comments. Also filed kubernetes/kubernetes#127990 to track the discussion.

cc @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024

### Authz and authn

Access to the endpoint will be limited to members of the `system:monitoring group` ([ref](https://github.com/kubernetes/kubernetes/blob/release-1.31/staging/src/k8s.io/apiserver/pkg/authentication/user/user.go#L73)), which is how paths like /healthz, /livez, /readyz are restricted today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I am not sure about this.

It seems like system:monitoring did not previously have permissions to call kubelet API do we really want to grant this permission?


#### As a developer

- I want to quickly identify the exact version of a binary running in production so I can correlate it with known issues or recent code changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already do this by calling, why not just use that?

kubectl get --raw /version
{
  "major": "1",
  "minor": "31",
  "gitVersion": "v1.31.1-gke.1472000",
  "gitCommit": "e21141bf6951009a138dda22624a11674fb9be02",
  "gitTreeState": "clean",
  "buildDate": "2024-09-23T20:27:37Z",
  "goVersion": "go1.22.6 X:boringcrypto",
  "compiler": "gc",
  "platform": "linux/amd64"
}%          

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is also exposed via a metric

kubectl get --raw /metrics|grep "^kubernetes_build_info.*"                                                                               INT|INT err  1m 10s  test kube 
kubernetes_build_info{build_date="2024-09-23T20:27:37Z",compiler="gc",git_commit="e21141bf6951009a138dda22624a11674fb9be02",git_tree_state="clean",git_version="v1.31.1-gke.500",go_version="go1.22.6 X:boringcrypto",major="1",minor="31",platform="linux/amd64"} 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the proposed statusz endpoint might seem a bit redundant given the existing /version endpoint. However, statusz offers a dedicated space for comprehensive component information, going beyond just version details. It has the potential to serve as a central debugging hub, providing not only component status but also links to other relevant z-pages for deeper observability.

Co-authored-by: Pranshu Srivastava <[email protected]>

Co-authored-by: Pranshu Srivastava <[email protected]>

Co-authored-by: Pranshu Srivastava <[email protected]>

Access to the endpoint will be limited to members of the `system:monitoring group` ([ref](https://github.com/kubernetes/kubernetes/blob/release-1.31/staging/src/k8s.io/apiserver/pkg/authentication/user/user.go#L73)), which is how paths like /healthz, /livez, /readyz are restricted today.

// NOTE: Placeholder suggestion for handling kubelet auth. Subject to change based on https://github.com/kubernetes/kubernetes/issues/127990
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with deciding on it during implementation

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@vinayakankugoyal
Copy link
Contributor

/lgtm

@logicalhan
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, logicalhan, richabanker, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2024
@k8s-ci-robot k8s-ci-robot merged commit 28245bb into kubernetes:master Oct 11, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.