Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ws): add podTemplatePod to Workspace status, for backend logs #210

Merged

Conversation

thesuperzapper
Copy link
Member

related: #208
related: #172

Currently, it would have been impossible for the backend to know which pod to read logs from as part of #208, so this PR adds podTemplatePod to the status of Workspaces.

This new status field will look something like (when the Pod exists):

status:
  podTemplatePod:
    name: ws-jupyterlab-workspace-56mlf-0
    containers:
      - name: main
    initContainers: []

This new status field will look something like (when the Pod does not exist):

status:
  podTemplatePod:
    name: ""

This PR also:

  • improves the generateWorkspaceStatus method by factoring out a generateWorkspaceState to generate the state/stateMessage
  • ensures we only do an UPDATE call when the status actually changes on WS and WSK (reduce API load)
  • factors out a common workspacePodTemplateContainerName variable, rather than using "main" everywhere

@google-oss-prow google-oss-prow bot requested a review from kimwnasptd February 14, 2025 01:59
@thesuperzapper thesuperzapper requested review from ederign and removed request for kimwnasptd February 14, 2025 02:02
@thesuperzapper
Copy link
Member Author

@ederign this should be ready to review/lgtm

Copy link
Member

@ederign ederign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm learning how controller works, so just left a few comments to clarify. @etirelli @harshad16 @andyatmiami maybe one of you can take a look on it?

@@ -4,5 +4,5 @@ resources:
- manager.yaml
images:
- name: controller
newName: ghcr.io/kubeflow/notebooks/workspace-controller
newName: controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesuperzapper, just to double-check that this was intentional!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@thesuperzapper thesuperzapper Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just an automated thing, but it was a mistake, yes!

It happens if you run make docker-build without IMG set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 689c828

log.Error(err, "unable to update Workspace status")
return ctrl.Result{}, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new for controller, but why we don't need workspace.Status.PendingRestart = false anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ederign a few lines up I realized we were not always correctly setting it.

Now I build the status by setting it to false at the beginning and then only setting it to true when I detect an option has a redirect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification!

Signed-off-by: Mathew Wicks <[email protected]>
@thesuperzapper
Copy link
Member Author

@ederign this should be good to LGTM now, I removed the accidental image diff in 689c828

@ederign
Copy link
Member

ederign commented Feb 14, 2025

/lgtm

Copy link

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still getting upto speed with codeblock
lgtm in general

just one question

return status, nil
state = kubefloworgv1beta1.WorkspaceStateRunning
stateMessage = stateMsgRunning
return state, stateMessage, nil
}

// get container status
// https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states
var containerStatus corev1.ContainerStatus

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there initcontainerstatus maintained some other place ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshad16 you are correct that the initContainer status is stored under status.InitContainerStatuses, but in this case, we are looking for the status of the "main" container, so we can get info about it.

But you are correct that we might have a problem if the Pod is failing because an init-container (or in fact some other container is in an error state), as we would fall through to the Unknown State.

Do you want to pick this up as another task after we merge?

@thesuperzapper
Copy link
Member Author

I am merging this for now, but as @harshad16 said in #210 (comment), there is still some work to do on the state extraction.

/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ederign, thesuperzapper

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thesuperzapper
Copy link
Member Author

Hmm, it seems like we now always need to have the ok-to-test label because otherwise the job is marked as "skipped" which seems to be failure.

/ok-to-test

@google-oss-prow google-oss-prow bot merged commit 6f14790 into kubeflow:notebooks-v2 Feb 14, 2025
12 checks passed
@thesuperzapper thesuperzapper deleted the add-pod-info-ws-status branch February 14, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants