-
Notifications
You must be signed in to change notification settings - Fork 641
[Scheduler] Replace AddMetadataToPod with AddMetadataToChildResource across all schedulers #4123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the RayCluster-specific AddMetadataToPod method with the more generic AddMetadataToChildResource across all batch schedulers to support both RayCluster and RayJob resources, improving extensibility for future CRDs.
Key Changes:
- Removed
AddMetadataToPodmethod from theBatchSchedulerinterface and all implementations - Updated all scheduler implementations to use
AddMetadataToChildResourcewith genericmetav1.Objecttypes - Modified call sites in
RayClusterReconcilerto use the new method signature
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
ray-operator/controllers/ray/raycluster_controller.go |
Updated method calls from AddMetadataToPod to AddMetadataToChildResource with reordered parameters |
ray-operator/controllers/ray/batchscheduler/yunikorn/yunikorn_scheduler.go |
Removed deprecated AddMetadataToPod method |
ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go |
Removed deprecated AddMetadataToPod method |
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins_test.go |
Updated test function names and method calls to use AddMetadataToChildResource |
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go |
Replaced AddMetadataToPod with AddMetadataToChildResource, added addSchedulerNameToObject helper |
ray-operator/controllers/ray/batchscheduler/kai-scheduler/kai_scheduler_test.go |
Updated test function names and method calls to use AddMetadataToChildResource |
ray-operator/controllers/ray/batchscheduler/kai-scheduler/kai_scheduler.go |
Replaced AddMetadataToPod with AddMetadataToChildResource, added addSchedulerNameToObject helper, removed RayCluster-specific import |
ray-operator/controllers/ray/batchscheduler/interface/interface.go |
Removed AddMetadataToPod from interface and default implementation, removed unused imports |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| func addSchedulerNameToObject(obj metav1.Object, schedulerName string) { | ||
| switch obj := obj.(type) { | ||
| case *corev1.Pod: | ||
| obj.Spec.SchedulerName = schedulerName | ||
| case *corev1.PodTemplateSpec: | ||
| obj.Spec.SchedulerName = schedulerName | ||
| } | ||
| } |
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addSchedulerNameToObject function is duplicated across multiple scheduler implementations (scheduler-plugins and kai-scheduler). Consider extracting this utility function to a shared package to reduce code duplication and improve maintainability.
| func addSchedulerNameToObject(obj metav1.Object, schedulerName string) { | ||
| switch obj := obj.(type) { | ||
| case *corev1.Pod: | ||
| obj.Spec.SchedulerName = schedulerName | ||
| case *corev1.PodTemplateSpec: | ||
| obj.Spec.SchedulerName = schedulerName | ||
| } | ||
| } |
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is an exact duplicate of the one in scheduler-plugins. Consider extracting this utility function to a shared package to reduce code duplication and improve maintainability.
|
@troychiu @owenowenisme @Future-Outlier |
| obj.Spec.SchedulerName = schedulerName | ||
| case *corev1.PodTemplateSpec: | ||
| obj.Spec.SchedulerName = schedulerName | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a panic for safty if the object does not match all the cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rueian, thanks for the review.
I think we don't need panic here, cause current silent no-op behavior for unsupported types is intentional.
At line 899, a *rayv1.RayCluster is passed in. This isn’t a Pod or PodTemplateSpec, and that’s by design.
AddMetadataToChildResource has two roles:
- propagate metadata (labels, annotations)
- set the scheduler name.
When it’s called with a RayCluster, only the metadata propagation part applies. RayCluster itself doesn’t need a schedulerName; only the actual Pods (Head/Worker Pods for RayCluster or Job for RayJob) do.
kuberay/ray-operator/controllers/ray/rayjob_controller.go
Lines 875 to 922 in 2d52001
| func (r *RayJobReconciler) getOrCreateRayClusterInstance(ctx context.Context, rayJobInstance *rayv1.RayJob) (*rayv1.RayCluster, error) { | |
| logger := ctrl.LoggerFrom(ctx) | |
| rayClusterNamespacedName := common.RayJobRayClusterNamespacedName(rayJobInstance) | |
| logger.Info("try to find existing RayCluster instance", "name", rayClusterNamespacedName.Name) | |
| rayClusterInstance := &rayv1.RayCluster{} | |
| if err := r.Get(ctx, rayClusterNamespacedName, rayClusterInstance); err != nil { | |
| if errors.IsNotFound(err) { | |
| logger.Info("RayCluster not found", "RayCluster", rayClusterNamespacedName) | |
| if len(rayJobInstance.Spec.ClusterSelector) != 0 { | |
| err := fmt.Errorf("clusterSelector mode is enabled, but RayCluster %s/%s is not found: %w", rayClusterNamespacedName.Namespace, rayClusterNamespacedName.Name, err) | |
| r.Recorder.Eventf(rayJobInstance, corev1.EventTypeWarning, string(utils.RayClusterNotFound), "RayCluster %s/%s set in the clusterSelector is not found. It must be created manually", rayClusterNamespacedName.Namespace, rayClusterNamespacedName.Name) | |
| return nil, err | |
| } | |
| logger.Info("RayCluster not found, creating RayCluster!", "RayCluster", rayClusterNamespacedName) | |
| rayClusterInstance, err = r.constructRayClusterForRayJob(rayJobInstance, rayClusterNamespacedName.Name) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if r.options.BatchSchedulerManager != nil && rayJobInstance.Spec.SubmissionMode == rayv1.K8sJobMode { | |
| if scheduler, err := r.options.BatchSchedulerManager.GetScheduler(); err == nil { | |
| // Group name is only used for individual pods to specify their task group ("headgroup", "worker-group-1", etc.). | |
| // RayCluster contains multiple groups, so we pass an empty string. | |
| scheduler.AddMetadataToChildResource(ctx, rayJobInstance, rayClusterInstance, "") | |
| } else { | |
| return nil, err | |
| } | |
| } | |
| if err := r.Create(ctx, rayClusterInstance); err != nil { | |
| r.Recorder.Eventf(rayJobInstance, corev1.EventTypeWarning, string(utils.FailedToCreateRayCluster), "Failed to create RayCluster %s/%s: %v", rayClusterInstance.Namespace, rayClusterInstance.Name, err) | |
| return nil, err | |
| } | |
| r.Recorder.Eventf(rayJobInstance, corev1.EventTypeNormal, string(utils.CreatedRayCluster), "Created RayCluster %s/%s", rayClusterInstance.Namespace, rayClusterInstance.Name) | |
| } else { | |
| return nil, err | |
| } | |
| } | |
| logger.Info("Found the associated RayCluster for RayJob", "RayCluster", rayClusterNamespacedName) | |
| // Verify that RayJob is not in cluster selector mode first to avoid nil pointer dereference error during spec comparison. | |
| // This is checked by ensuring len(rayJobInstance.Spec.ClusterSelector) equals 0. | |
| if len(rayJobInstance.Spec.ClusterSelector) == 0 && !utils.CompareJsonStruct(rayClusterInstance.Spec, *rayJobInstance.Spec.RayClusterSpec) { | |
| logger.Info("Disregard changes in RayClusterSpec of RayJob") | |
| } | |
| return rayClusterInstance, nil | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move AddSchedulerNameToObject into shared utils to reduce duplication. 16484bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it set the scheduler name to the PodTemplateSpecs in the rayv1.RayCluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are fine. Do you have any concerns?
16484bf to
ae3c527
Compare
…across all schedulers Signed-off-by: win5923 <[email protected]>
ae3c527 to
148b6f7
Compare
Signed-off-by: win5923 <[email protected]>
148b6f7 to
8601856
Compare
Why are these changes needed?
Replace the RayCluster-only method AddMetadataToPod with AddMetadataToChildResource to support RayJob and future CRDs. This simplifies scheduler integration and future extensibility.
For
kai-schedulerandscheduler-plugins, onlyRayClusteris supported.For
VolcanoandYuniKorn, bothRayClusterandRayJobare supported.E2E:
Volcano
RayCluster


RayJob
submissionMode: "K8sJobMode"


submissionMode: "SidecarMode"


Yunikorn
Related issue number
Closes #4097
Checks