You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a number of straightforward statistical aggregate functions which we should be able to support without too much effort, although as always we have to make some decisions.
Statistical aggregate functions recommended for consideration
Sample variance var_samp
Sample covariance covar_samp (multi-argument, not natively supported by Redshift)
Sample standard deviation stddev_samp
Population variance var_pop
Population covariance covar_pop (multi-argument, not natively supported by Redshift)
Populate standard deviation stddev_pop
Correlation coefficient: corr (multi-argument, not natively supported by Redshift)
Statistical aggregate functions NOT under consideration
Kurtosis: kurtosis (not natively supported by BigQuery, Postgres, Redshift)
Skewness: skewness (skew in Snowflake, not natively supported by BigQuery, Postgres, Redshift)
Native implementations are missing from too many engines to justify the effort for these, especially given how little use they're likely to see.
Overall recommendation
Start with the ones supported across all engines, as those are much more straightforward to develop and test since they are universally supported and fit into our existing aggregate function model.
Separately, evaluate whether or not to bother with custom native-sql implementations of the covariance and correlation functions for Redshift. These are also more complex because they are the first multi-input aggregate functions we would be supporting.
There are a number of straightforward statistical aggregate functions which we should be able to support without too much effort, although as always we have to make some decisions.
There is a current request for var_samp and covar_samp for BigQuery, but there are others we could add to this list.
Statistical aggregate functions recommended for consideration
var_samp
covar_samp
(multi-argument, not natively supported by Redshift)stddev_samp
var_pop
covar_pop
(multi-argument, not natively supported by Redshift)stddev_pop
corr
(multi-argument, not natively supported by Redshift)Statistical aggregate functions NOT under consideration
kurtosis
(not natively supported by BigQuery, Postgres, Redshift)skewness
(skew
in Snowflake, not natively supported by BigQuery, Postgres, Redshift)Native implementations are missing from too many engines to justify the effort for these, especially given how little use they're likely to see.
Overall recommendation
Start with the ones supported across all engines, as those are much more straightforward to develop and test since they are universally supported and fit into our existing aggregate function model.
Separately, evaluate whether or not to bother with custom native-sql implementations of the covariance and correlation functions for Redshift. These are also more complex because they are the first multi-input aggregate functions we would be supporting.
SL-1967
The text was updated successfully, but these errors were encountered: