Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: extend extension documentation #310

Closed
wants to merge 2 commits into from

Conversation

jvanstraten
Copy link
Contributor

@jvanstraten jvanstraten commented Sep 5, 2022

This depends on #309 (its commit as initially proposed is included in the history here) so I've marked this as draft for now. There's also a hidden conflict with #302, since this documents the option that a change is being proposed for there. I'll rebase/update when both of those are merged.

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this update. Thanks @jvanstraten !

Will do a second pass once out of draft form.

 - coin the term "core extensions" and describe why they (and extensions
   in general) exist
 - document guidelines from substrait-io#251 and substrait-io#307
 - document common options in core extensions (substrait-io#254)
 - better integrate the function documentation generator output into the
   website
@jvanstraten jvanstraten marked this pull request as ready for review September 27, 2022 15:39
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, a couple of comments. Thanks @jvanstraten

- Example: [#289](https://github.com/substrait-io/substrait/issues/289)
- Example: [#295](https://github.com/substrait-io/substrait/issues/295)

- Aim for syntactic and semantic consistency with widely used SQL dialects, especially PostgreSQL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with the "especially postgres" comment here. We should be focused on what's common, not what's postgres. If Sql server and oracle do something one way and postgres does it differently, I think we should do what Sql Server and Oracle do. There are lots of places where Postgres does super weird shit that isn't "standard" at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copypasted this from #307 and feel a bit out of my league here. @ianmcook, any comment? If not I'll just remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it in 22fd8c3

- Be consistent when it comes to argument types. It is preferable to define a function that accepts and returns one type class over a function that promotes from one type class or another or accepts a mixture of type classes. This aims to prevent an explosion of function implementations.
- More information and examples: [#251](https://github.com/substrait-io/substrait/issues/251)

- Be pedantic when describing functionality. The corner cases that rarely come up in practice are exactly the places where different implementations are likely to differ, so for a plan to be implementation-agnostic, these are exactly the things that need to be specified exhaustively. For especially pedantic things, an optional enumeration argument may be suitable; this allows a producer to explicitly indicate that the consumer can pick the behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use the word pedantic. Let's come up with something neutral to positive, such as exhaustive, highly detailed, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO "pedantic" is not a bad thing for a specification, but, sure. "Precise" as a drop-in replacement maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer inspection, I replaced the first instance of "pedantic" with "precise" in 22fd8c3, but left the second, and added an example instead. Using google's definition, "pedantic" means "excessively concerned with minor details or rules; overscrupulous." You can't be "concerned with minor details or rules" enough when writing a specification; it must specify every single minor detail and rule, or it wouldn't be unambiguous. Nevertheless, people tend to find minor details and rules unnecessary (excessive, overscrupulous) and therefore pedantic when they consider them to be common sense or obvious, and thus will avoid stating such things to save time and avoid annoying the reader. The purpose of the guideline is to remind people not to fall into that trap and specify everything, regardless of how they feel about it or expect others to feel about it. Usage of the word is justified in this context.

- Example: the verbosity of the description of [regex_match_substring](https://github.com/substrait-io/substrait/blob/fbe5e0949b863334d02b5ad9ecac55ec8fc4debb/extensions/functions_string.yaml#L79-L139).
- Example: the floating point rounding option defined [here](common_options.md).

- The core extensions should generally not be defining type classes. If you believe a type class that isn't currently in the specification is important enough to include, it probably makes more sense to simply add it to the built-in types, or otherwise should be a third-party extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this. I expect that we create "compatibility extensions" over time within the core project. For example, we may have a set of data type and function extensions specifically targeting arrow, postgres or mysql. In many cases it would be useful to have a canonical set of items here.

Copy link
Contributor Author

@jvanstraten jvanstraten Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The way I figured things would go is that data representation projects like Arrow would publish and maintain extension types of their own, that everyone else can then use if they use that data format under the hood. I'm not sure Substrait should necessarily be involved with that.

ETA: I guess that's not really a rebuttal for the general case. I'm not overly attached to the statement. It just feels weird for Substrait itself to define both "built-in" types and extension types. I mean, if something is worth specifying centrally, why not just add it as a built-in type? Consumers are free to reject stuff they don't understand, whether it's a built-in type or an extension type from the core extensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed statement in 22fd8c3.

@CLAassistant
Copy link

CLAassistant commented Oct 6, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@EpsilonPrime EpsilonPrime added the awaiting-user-input This issue is waiting on further input from users label Aug 16, 2023
@jacques-n
Copy link
Contributor

No progress has been made in more than six months. Closing without prejudice.

@jacques-n jacques-n closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-input This issue is waiting on further input from users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants