-
Notifications
You must be signed in to change notification settings - Fork 615
[feat] enforce DNS1035 validations on RayCluster, RayService, and RayJob names #3239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Job names Signed-off-by: Rueian <[email protected]>
@@ -24,6 +25,9 @@ func ValidateRayClusterMetadata(metadata metav1.ObjectMeta) error { | |||
if len(metadata.Name) > MaxRayClusterNameLength { | |||
return fmt.Errorf("RayCluster name should be no more than %d characters", MaxRayClusterNameLength) | |||
} | |||
if errs := validation.IsDNS1035Label(metadata.Name); len(errs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain this won't break any existing systems using KubeRay that fail this validation? What are possible names previously used that would now fail this validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR makes any names that start with a number or contain dots or upper letters fail at the beginning.
Also note that #3083 makes RayCluster names longer than 53 characters fail, and RayJob/RayServices names longer than 47 characters fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names starting with dots or upper-case letters is fine because the previous validation didn't allow it:
$ kubectl ray create cluster Sample-cluster
Error: Failed to create Ray cluster with: RayCluster.ray.io "Sample-cluster" is invalid: metadata.name: Invalid value: "Sample-cluster": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
But disallowing RayClusters that start with numbers seems like a breaking change:
$ kubectl ray create cluster 1-sample-cluster
Created Ray Cluster: 1-sample-cluster
Is there a really good reason why we should break support this naming scheme? Could we adapt validation to still allow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind opening an issue to make sure we document this breaking change in v1.4.0 release note?
Sure. #3271 |
Names starting with dots or upper-case letters is fine because the previous validation didn't allow it:
But disallowing RayClusters that start with numbers seems like a breaking change:
Is there a really good reason why we should break support this naming scheme? Could we adapt validation to still allow it? |
The reason is that users tend to assume the generated service name is just In v1.3, the prefix of generated service names will be trimmed if the CR name is too long or be prepended with And we want that assumption to always hold for better user experiences, so #3083 and this PR add validations for length limitations and DNS1035 rules. |
…Job names (ray-project#3239) Signed-off-by: Rueian <[email protected]>
Why are these changes needed?
One of the goals of #3083 is to provide more predictable Kubernetes service names generated by the KubeRay controller for RayCluster, RayService, and RayJob. Therefore, we shortened the allowed length for CR names and validated them at the beginning so that we can now only add fixed suffixes to generate their Kubernetes service names without mutating and trimming the prefixes of their names. Mutating and trimming their prefixes make them less predictable.
A case mentioned in #2169 shows why we want #3083 for not mutating and trimming the prefixes of service names: A user tends to fully copy the CR name with the generated suffix, such as
-head-svc
, and uses it elsewhere:But it also shows another case we missed: Kubernetes requires a service name to be a DNS1035 label or we will get this error when creating a service:
This PR adds DNS1035 validations on RayCluster, RayService, and RayJob names.
Related issue number
Checks