[SL-1967] Add support for statistical aggregate functions #1111

tlento · 2024-04-03T23:38:57Z

There are a number of straightforward statistical aggregate functions which we should be able to support without too much effort, although as always we have to make some decisions.

There is a current request for var_samp and covar_samp for BigQuery, but there are others we could add to this list.

Statistical aggregate functions recommended for consideration

Sample variance var_samp
Sample covariance covar_samp (multi-argument, not natively supported by Redshift)
Sample standard deviation stddev_samp
Population variance var_pop
Population covariance covar_pop (multi-argument, not natively supported by Redshift)
Populate standard deviation stddev_pop
Correlation coefficient: corr (multi-argument, not natively supported by Redshift)

Statistical aggregate functions NOT under consideration

Kurtosis: kurtosis (not natively supported by BigQuery, Postgres, Redshift)
Skewness: skewness (skew in Snowflake, not natively supported by BigQuery, Postgres, Redshift)

Native implementations are missing from too many engines to justify the effort for these, especially given how little use they're likely to see.

Overall recommendation

Start with the ones supported across all engines, as those are much more straightforward to develop and test since they are universally supported and fit into our existing aggregate function model.

Separately, evaluate whether or not to bother with custom native-sql implementations of the covariance and correlation functions for Redshift. These are also more complex because they are the first multi-input aggregate functions we would be supporting.

_SL-1967

The text was updated successfully, but these errors were encountered:

tlento · 2024-04-03T23:41:29Z

Note - this is closely related to, and possibly a pre-requisite for, #52

tlento changed the title ~~Add support for statistical aggregate functions~~ [SL-1967] Add support for statistical aggregate functions Apr 3, 2024

tlento added the backlog label Apr 3, 2024

tlento added Metricflow Created by Linear-GitHub Sync Medium priority Created by Linear-GitHub Sync labels Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SL-1967] Add support for statistical aggregate functions #1111

[SL-1967] Add support for statistical aggregate functions #1111

tlento commented Apr 3, 2024 •

edited

Loading

tlento commented Apr 3, 2024 •

edited

Loading

[SL-1967] Add support for statistical aggregate functions #1111

[SL-1967] Add support for statistical aggregate functions #1111

Comments

tlento commented Apr 3, 2024 • edited Loading

Statistical aggregate functions recommended for consideration

Statistical aggregate functions NOT under consideration

Overall recommendation

tlento commented Apr 3, 2024 • edited Loading

tlento commented Apr 3, 2024 •

edited

Loading

tlento commented Apr 3, 2024 •

edited

Loading