-
Notifications
You must be signed in to change notification settings - Fork 661
[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
710166a to
d261b0b
Compare
|
Hi @andrewsykim, I followed you previous comments to adding a spec.upgradeStrategy API to RayCluster. But for now. I'm concerned this approach may introduce some issues:
Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT? |
Feature gates are used to gate features that are in early development and not ready for wider adoption, it shouldn't be used to change the behavior of RayCluster because it will eventually be on by default (and forced on). |
|
I think both of those concerns are valid, but I don't think this is a problem with separation of concerns as RayCluster is a building block for both RayService and RayJob. For those cases you mentioned, we should have validation to ensure RayCluster upgrade strategy cannot be set when used w/ RayJob and RayService |
05b8108 to
7109cf1
Compare
3d448e6 to
8bcce91
Compare
Signed-off-by: win5923 <[email protected]>
8bcce91 to
bf87764
Compare
c9d35b2 to
8d4c813
Compare
f3c1e04 to
2c5166d
Compare
Signed-off-by: win5923 <[email protected]>
2c5166d to
9359c5f
Compare
|
Hi @andrewsykim, PTAL, thanks. |
Signed-off-by: win5923 <[email protected]>
99d114b to
acd7739
Compare
| templateHash := "" | ||
| if hash, err := GeneratePodTemplateHash(instance.Spec.HeadGroupSpec.Template); err == nil { | ||
| templateHash = hash | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider logging the error and returning it up the call stack so that reconciliation can be retried.
| expectError: true, | ||
| errorMessage: "upgradeStrategy cannot be set when RayCluster is created by RayService", | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a unit test that also verifies that invalid strategy values.
Why are these changes needed?
Currently, when users update a RayCluster spec (e.g., update the image), users must re-create the cluster or set
spec.suspendto true and after all Pods are deleted and then set it back to false.Ref: #2534 (comment)
Design doc: https://docs.google.com/document/d/1xQLm0-WQWD-FkufxBJYklOJGvVn4RLk0_vPjLD5ax7o/edit?usp=sharing
Changes
spec.upgradeStrategyfield to RayCluster CRDRecreate: During upgrade, Recreate strategy will delete all existing pods before creating new ones.None: No new pod will be created while the strategy is set to NoneImplementation
HeadGroupSpec.Templateto head pod andworkerGroup.Templateto worker pod annotations during creation withray.io/pod-template-hashI only compare the
HeadGroupSpec.TemplateandworkerGroup.Templatebecause these define the pod-related configurations. TheRayCluster.Speccontains many dynamic and component-specific settings (e.g., autoscaler configs, rayStartParams).Example:
Related issue number
Checks