-
Notifications
You must be signed in to change notification settings - Fork 66
feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599
Conversation
0acf6b9 to
1c8c9ac
Compare
0ac0946 to
8b9c3cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution and getting the release process started for notebooks-v1!
I think we still need some work on this PR - primarily around communication on why some of these changes are necessary and deviate from how kubeflow/kubeflow was set up.
I am not implying (necessarily) that they are wrong - only that there is no explanation on this PR to be able to understand why the proposed changes were required.
| docker-build-multi-arch: ## Build multi-arch docker images with docker buildx | ||
| cd ../ && docker buildx build --load --platform ${ARCH} --tag ${IMG}:${TAG} -f ${DOCKERFILE} . | ||
|
|
||
| .PHONY: docker-build-multi-arch-verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why we needed to add this Makefile target? Why was this not needed in kubeflow/kubeflow but is now needed in kubeflow/notebooks?
This is the kind of thing I would have expected to be called out in the commit message and/or PR description - but I don't see any reference to it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging each architecture separately would be a breaking change for customers - and I don't think "sneaking that in" in is something I could "sign off on" without discussion from community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so as I mention, there where issues with ARM64 and I did of course open my proposal to discussion.
This target was added only for this scenario (and not instead of) and it helps to validate what is happening with the problematic image.
Now as I mention, most of the time (if not all) I can't fail the build step since not sure the image build is failing, it just get stuck.
I'm not sue why it's happens here and not in kubeflow/kubeflow but from what I have read it might be related to differences in runners. Maybe the experts from the community can put some light on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, apologies - I had missed your discussion comment on the PR!
| - name: Verify multi-arch builds | ||
| run: | | ||
| echo "Verifying multi-arch build results:" | ||
| echo "Listing all jupyter-web-app images:" | ||
| docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" | grep jupyter-web-app || echo "No jupyter-web-app images found" | ||
| echo "" | ||
| echo "Checking individual architectures:" | ||
| TAG=$(git describe --tags --always --dirty) | ||
| echo "Base TAG: $TAG" | ||
| # Check each architecture | ||
| if docker images | grep -q "jupyter-web-app.*${TAG}-linux-amd64"; then | ||
| echo "✅ AMD64: SUCCESS" | ||
| else | ||
| echo "❌ AMD64: FAILED" | ||
| fi | ||
| if docker images | grep -q "jupyter-web-app.*${TAG}-linux-ppc64le"; then | ||
| echo "✅ PPC64LE: SUCCESS" | ||
| else | ||
| echo "❌ PPC64LE: FAILED" | ||
| fi | ||
| if docker images | grep -q "jupyter-web-app.*${TAG}-linux-arm64-v8"; then | ||
| echo "✅ ARM64: SUCCESS" | ||
| else | ||
| echo "ℹ️ ARM64: Not built (expected if timeout occurred)" | ||
| fi No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment on https://github.com/kubeflow/notebooks/pull/599/files#diff-07cbcedb761f69b198691b554bea634acaf78ebc7005415b45a3fe51033cc561...
I am not sure why this Verify command is necessary... are you observing cases where the build step does NOT load an image - and yet the GitHub Action workflow continues on? That would be the only situation in which I would think we need this explicit "Verify" step - and yet I would expec the build step to fail (and likewise the workflow to error).
I need more details around necessity of this logic before I could sign off on these changes.
I'm unclear why this Verify target is necessary (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I'm observing cases where image not loaded (not sure if they will pop since the step is hang). As mentioned before I can't decide on failure in the build phase. In this PR my initial suggestion is to wrap it with timeout and then run some validations, of course in that phase we can fail the build if needed.
| - name: Build ARM64 Image (Optional) | ||
| timeout-minutes: 60 | ||
| continue-on-error: true | ||
| run: | | ||
| cd components/crud-web-apps/jupyter | ||
| ARCH=linux/arm64/v8 make docker-build-multi-arch-verify | ||
| echo "Completed linux/arm64/v8 build!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not an optional step in the original kubeflow/kubeflow JWA Multi-Arch Build Test workflow.. Would need you to add additional context here to judge if this is in fact a warranted/necessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as mentioned before, from my investigation it might be a runner issue but I'm not sure.
| - components/crud-web-apps/common/** | ||
| - releasing/version/VERSION | ||
| branches: | ||
| - master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely correct, my bad. Will fix it in the next patch after we will resolve some of those discussions above.
e0c3ab3 to
9ee858b
Compare
|
/ok-to-test |
|
|
||
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we use 3.12? 3.7 is EOL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as outlined in my #kubeflow-notebooks Slack message - I'd prefer for this initial PR to migrate the workflows - that we try to adhere as close as possible to the kubeflow/kubeflow implementation.
Your feedback/insights totally spot on - I'd just prefer a follow up PR once we have notebooks-v1 branch of kubeflow/notebooks at feature parity .
| - name: Setup Python environment | ||
| run: | | ||
| cd components/crud-web-apps/jupyter/backend | ||
| make install-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make install-dependencies i would prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see: #599 (comment)
| set -euo pipefail | ||
|
|
||
| ISTIO_VERSION="1.17.8" | ||
| ISTIO_URL="https://istio.io/downloadIstio" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use https://github.com/kubeflow/manifests/blob/6aad970e15f07af7841c46c0b03f65e7b06bf02f/example/kustomization.yaml#L39 if you want to stay close to the platform and not maintain istio yourself. This is also used in the KFP repository here https://github.com/kubeflow/pipelines/blob/51ab5e678feee36b59a000f3a03abe0ccd174360/.github/resources/scripts/deploy-kfp.sh#L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see: #599 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also more easily free space if needed if you take a look at the KFP GHA instead of working with the swapfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see: #599 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see: #599 (comment)
| "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key" | ||
| nodes: | ||
| - role: control-plane | ||
| image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test with Kubernetes 1.33 as in You can copy https://github.com/kubeflow/manifests/blob/master/tests/install_KinD_create_KinD_cluster_install_kustomize.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see: #599 (comment)
9ee858b to
534a8ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work and experimentation here.
Just a few comments to review/clean up and I think we'll be ready to get this in. Please reach out if you have any questions!
…tebooks-v1 branch kubeflow#587 Signed-off-by: Yehudit Kerido <[email protected]>
534a8ac to
825bc66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for all this work @yehudit1987 - was certainly more "fun/challenging" than I originally anticipated - but appreciate you working through it all.
I'm comfortable with the changes and have verified the workflows behave as expected.
Tested on my fork:
|
really great work @yehudit1987! And thank you @andyatmiami for the very thorough review, as always! /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kimwnasptd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |

feat: migrate Jupyter Web App tests to notebooks repo
Enables JWA tests to run in kubeflow/notebooks repository.
Solves issue #587