docs: extend extension documentation #310

jvanstraten · 2022-09-05T15:39:04Z

Coin the term "core extensions" and describe why they (and extensions in general) exist.
Document guidelines from Implicit casts/promotions in functions within Substrait's core extensions #251 and Establish intended scope of functions in extensions/*.yaml #307.
Document common options in core extensions (Integer overflow, floating point rounding options, and domain error handling options are undocumented #254).
Better integrate the function documentation generator output into the website.

This depends on #309 (its commit as initially proposed is included in the history here) so I've marked this as draft for now. There's also a hidden conflict with #302, since this documents the option that a change is being proposed for there. I'll rebase/update when both of those are merged.

jacques-n

Really like this update. Thanks @jvanstraten !

Will do a second pass once out of draft form.

- coin the term "core extensions" and describe why they (and extensions in general) exist - document guidelines from substrait-io#251 and substrait-io#307 - document common options in core extensions (substrait-io#254) - better integrate the function documentation generator output into the website

jacques-n

Mostly looks good, a couple of comments. Thanks @jvanstraten

jacques-n · 2022-09-27T15:54:09Z

site/docs/extensions/core_extensions/guidelines.md

+    - Example: [#289](https://github.com/substrait-io/substrait/issues/289)
+    - Example: [#295](https://github.com/substrait-io/substrait/issues/295)
+
+ - Aim for syntactic and semantic consistency with widely used SQL dialects, especially PostgreSQL.


I don't agree with the "especially postgres" comment here. We should be focused on what's common, not what's postgres. If Sql server and oracle do something one way and postgres does it differently, I think we should do what Sql Server and Oracle do. There are lots of places where Postgres does super weird shit that isn't "standard" at all.

I copypasted this from #307 and feel a bit out of my league here. @ianmcook, any comment? If not I'll just remove it.

Removed it in 22fd8c3

jacques-n · 2022-09-27T15:56:52Z

site/docs/extensions/core_extensions/guidelines.md

+ - Be consistent when it comes to argument types. It is preferable to define a function that accepts and returns one type class over a function that promotes from one type class or another or accepts a mixture of type classes. This aims to prevent an explosion of function implementations.
+    - More information and examples: [#251](https://github.com/substrait-io/substrait/issues/251)
+
+ - Be pedantic when describing functionality. The corner cases that rarely come up in practice are exactly the places where different implementations are likely to differ, so for a plan to be implementation-agnostic, these are exactly the things that need to be specified exhaustively. For especially pedantic things, an optional enumeration argument may be suitable; this allows a producer to explicitly indicate that the consumer can pick the behavior.


Let's not use the word pedantic. Let's come up with something neutral to positive, such as exhaustive, highly detailed, etc.

IMO "pedantic" is not a bad thing for a specification, but, sure. "Precise" as a drop-in replacement maybe?

Upon closer inspection, I replaced the first instance of "pedantic" with "precise" in 22fd8c3, but left the second, and added an example instead. Using google's definition, "pedantic" means "excessively concerned with minor details or rules; overscrupulous." You can't be "concerned with minor details or rules" enough when writing a specification; it must specify every single minor detail and rule, or it wouldn't be unambiguous. Nevertheless, people tend to find minor details and rules unnecessary (excessive, overscrupulous) and therefore pedantic when they consider them to be common sense or obvious, and thus will avoid stating such things to save time and avoid annoying the reader. The purpose of the guideline is to remind people not to fall into that trap and specify everything, regardless of how they feel about it or expect others to feel about it. Usage of the word is justified in this context.

jacques-n · 2022-09-27T15:57:25Z

site/docs/extensions/core_extensions/guidelines.md

+    - Example: the verbosity of the description of [regex_match_substring](https://github.com/substrait-io/substrait/blob/fbe5e0949b863334d02b5ad9ecac55ec8fc4debb/extensions/functions_string.yaml#L79-L139).
+    - Example: the floating point rounding option defined [here](common_options.md).
+
+ - The core extensions should generally not be defining type classes. If you believe a type class that isn't currently in the specification is important enough to include, it probably makes more sense to simply add it to the built-in types, or otherwise should be a third-party extension.


I don't agree with this. I expect that we create "compatibility extensions" over time within the core project. For example, we may have a set of data type and function extensions specifically targeting arrow, postgres or mysql. In many cases it would be useful to have a canonical set of items here.

Hm. The way I figured things would go is that data representation projects like Arrow would publish and maintain extension types of their own, that everyone else can then use if they use that data format under the hood. I'm not sure Substrait should necessarily be involved with that.

ETA: I guess that's not really a rebuttal for the general case. I'm not overly attached to the statement. It just feels weird for Substrait itself to define both "built-in" types and extension types. I mean, if something is worth specifying centrally, why not just add it as a built-in type? Consumers are free to reject stuff they don't understand, whether it's a built-in type or an extension type from the core extensions.

Removed statement in 22fd8c3.

CLAassistant · 2022-10-06T23:47:14Z

All committers have signed the CLA.

CLAassistant · 2022-10-06T23:47:50Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

jacques-n · 2024-08-08T03:02:48Z

No progress has been made in more than six months. Closing without prejudice.

jvanstraten mentioned this pull request Sep 5, 2022

Integer overflow, floating point rounding options, and domain error handling options are undocumented #254

Open

jacques-n reviewed Sep 5, 2022

View reviewed changes

This was referenced Sep 6, 2022

Establish intended scope of functions in extensions/*.yaml #307

Closed

Docs on Overflow Behavior #323

Closed

Docs on Domain Error #324

Closed

feat: add arg_min and arg_max #326

Closed

jvanstraten force-pushed the extension-docs branch from 5e98051 to 6b38d99 Compare September 27, 2022 15:35

jvanstraten force-pushed the extension-docs branch from 6b38d99 to c5c057d Compare September 27, 2022 15:38

jvanstraten marked this pull request as ready for review September 27, 2022 15:39

jacques-n reviewed Sep 27, 2022

View reviewed changes

chore: address review comments

22fd8c3

EpsilonPrime added the awaiting-user-input This issue is waiting on further input from users label Aug 16, 2023

jacques-n closed this Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: extend extension documentation #310

docs: extend extension documentation #310

jvanstraten commented Sep 5, 2022 •

edited

Loading

jacques-n left a comment

jacques-n left a comment

jacques-n Sep 27, 2022

jvanstraten Sep 27, 2022

jvanstraten Oct 3, 2022

jacques-n Sep 27, 2022

jvanstraten Sep 27, 2022

jvanstraten Oct 3, 2022

jacques-n Sep 27, 2022

jvanstraten Sep 27, 2022 •

edited

Loading

jvanstraten Oct 3, 2022

CLAassistant commented Oct 6, 2022 •

edited

Loading

CLAassistant commented Oct 6, 2022

jacques-n commented Aug 8, 2024

docs: extend extension documentation #310

docs: extend extension documentation #310

Conversation

jvanstraten commented Sep 5, 2022 • edited Loading

jacques-n left a comment

Choose a reason for hiding this comment

jacques-n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvanstraten Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Oct 6, 2022 • edited Loading

CLAassistant commented Oct 6, 2022

jacques-n commented Aug 8, 2024

jvanstraten commented Sep 5, 2022 •

edited

Loading

jvanstraten Sep 27, 2022 •

edited

Loading

CLAassistant commented Oct 6, 2022 •

edited

Loading