-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SortRel is a little under documented #213
Comments
|
All of the unspecified suffixes mean that the value wasn't specified when constructing the protobuf. Their only semantic is to communicate that nothing was specified. They don't imply anything about default behavior for example, which imo should be specified by the producer. |
Is there specific stuff that you think should be covered here that is not? Custom function reference let's you choose to use an arbitrary binary function for ordering. This allow you to use alternative collations. |
That helps, I was looking at the documentation for SortRel. I'll try and answer my own questions then.
However, is this a scalar function? Wouldn't a sort comparator (that returns int) be called
What is the correct behavior of the consumer if it encounters this? An error? Also, if there is any kind of value that makes any kind of sense as a default we should generally use that instead of unspecified, especially as we add new things to the proto. Otherwise, forwards compatibility will just throw an error whenever an older file is encountered. |
Yes actually; the type (including nullability) or set of types that the ordering function may return; there's no such thing as Also,
implies to me that every type needs to have one of these, and thus has a defined ordering (I also have no idea what a quality function is or looks like, as it doesn't appear to be defined). I don't even know how a map would be ordered, let alone arbitrary custom types. I can accept that Substrait requires an equality function be implicitly defined for every type since it's needed in various places (aggregations for example), but also requiring an implicit ordering function seems a bit much to me. If not every type, which types have one and which don't? Also, how do the orderings work for the less obvious types? For example for
IIRC protobuf APIs convert unrecognized enum options to the |
I agree with you. I wasn't arguing for consumer defaults as much as spec defaults. I was simply saying that we should try and define a spec default value when it makes sense. For example, I think it probably would have been fine to let an ascending sort be the default sort or let an inner join be the default join. In retrospect I think I was making it more of an issue that it needs to be. Using
It appears that it will either just pass on the unrecognized value or use it's own language-specific unknown option. Either way I do not believe it will use the default value in this case:
|
Let's make this question number 6.
I think it might be good for every type to have a Substrait defined default ordering, even if consumers can't support all of them. This is a topic that could have a fair amount of explanation / discussion and might be better off as its own issue. |
The thinking here is that the value is a reference to a function defined in extensions. We should more formally specify that: it is a 2 argument function that returns an int32 with the values of -1, 0, and 1 to provide a stable sort. So for each sort type + argument type, you have a different function you reference for comparison purposes. The function has a defined requirement of two arguments and a defined way things are output so you don't need to define any arguments (beyond what you do for any sort field). I agree that the documentation needs to be enhanced. Happy to see PRs suggesting improvements. |
SORT_DIRECTION_CLUSTERED
mean?comparison_function_reference
? What kind of function is this? Why isn't this just an extension relation?The text was updated successfully, but these errors were encountered: