-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: extend extension documentation #310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like this update. Thanks @jvanstraten !
Will do a second pass once out of draft form.
5e98051
to
6b38d99
Compare
- coin the term "core extensions" and describe why they (and extensions in general) exist - document guidelines from substrait-io#251 and substrait-io#307 - document common options in core extensions (substrait-io#254) - better integrate the function documentation generator output into the website
6b38d99
to
c5c057d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, a couple of comments. Thanks @jvanstraten
- Example: [#289](https://github.com/substrait-io/substrait/issues/289) | ||
- Example: [#295](https://github.com/substrait-io/substrait/issues/295) | ||
|
||
- Aim for syntactic and semantic consistency with widely used SQL dialects, especially PostgreSQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with the "especially postgres" comment here. We should be focused on what's common, not what's postgres. If Sql server and oracle do something one way and postgres does it differently, I think we should do what Sql Server and Oracle do. There are lots of places where Postgres does super weird shit that isn't "standard" at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it in 22fd8c3
- Be consistent when it comes to argument types. It is preferable to define a function that accepts and returns one type class over a function that promotes from one type class or another or accepts a mixture of type classes. This aims to prevent an explosion of function implementations. | ||
- More information and examples: [#251](https://github.com/substrait-io/substrait/issues/251) | ||
|
||
- Be pedantic when describing functionality. The corner cases that rarely come up in practice are exactly the places where different implementations are likely to differ, so for a plan to be implementation-agnostic, these are exactly the things that need to be specified exhaustively. For especially pedantic things, an optional enumeration argument may be suitable; this allows a producer to explicitly indicate that the consumer can pick the behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use the word pedantic. Let's come up with something neutral to positive, such as exhaustive, highly detailed, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO "pedantic" is not a bad thing for a specification, but, sure. "Precise" as a drop-in replacement maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon closer inspection, I replaced the first instance of "pedantic" with "precise" in 22fd8c3, but left the second, and added an example instead. Using google's definition, "pedantic" means "excessively concerned with minor details or rules; overscrupulous." You can't be "concerned with minor details or rules" enough when writing a specification; it must specify every single minor detail and rule, or it wouldn't be unambiguous. Nevertheless, people tend to find minor details and rules unnecessary (excessive, overscrupulous) and therefore pedantic when they consider them to be common sense or obvious, and thus will avoid stating such things to save time and avoid annoying the reader. The purpose of the guideline is to remind people not to fall into that trap and specify everything, regardless of how they feel about it or expect others to feel about it. Usage of the word is justified in this context.
- Example: the verbosity of the description of [regex_match_substring](https://github.com/substrait-io/substrait/blob/fbe5e0949b863334d02b5ad9ecac55ec8fc4debb/extensions/functions_string.yaml#L79-L139). | ||
- Example: the floating point rounding option defined [here](common_options.md). | ||
|
||
- The core extensions should generally not be defining type classes. If you believe a type class that isn't currently in the specification is important enough to include, it probably makes more sense to simply add it to the built-in types, or otherwise should be a third-party extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this. I expect that we create "compatibility extensions" over time within the core project. For example, we may have a set of data type and function extensions specifically targeting arrow, postgres or mysql. In many cases it would be useful to have a canonical set of items here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The way I figured things would go is that data representation projects like Arrow would publish and maintain extension types of their own, that everyone else can then use if they use that data format under the hood. I'm not sure Substrait should necessarily be involved with that.
ETA: I guess that's not really a rebuttal for the general case. I'm not overly attached to the statement. It just feels weird for Substrait itself to define both "built-in" types and extension types. I mean, if something is worth specifying centrally, why not just add it as a built-in type? Consumers are free to reject stuff they don't understand, whether it's a built-in type or an extension type from the core extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed statement in 22fd8c3.
|
No progress has been made in more than six months. Closing without prejudice. |
extensions/*.yaml
#307.This depends on #309 (its commit as initially proposed is included in the history here) so I've marked this as draft for now. There's also a hidden conflict with #302, since this documents the option that a change is being proposed for there. I'll rebase/update when both of those are merged.