-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(snowflake)!: Annotate type for snowflake MEDIAN #6426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sqlglot/typing/snowflake.py
Outdated
| exp.Substring, | ||
| exp.TimeSlice, | ||
| exp.TimestampTrunc, | ||
| exp.Median, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, did you test it ?
Docs state Returns a FLOAT or DECIMAL (fixed-point) number, depending upon the input..
SELECT SYSTEM$TYPEOF(MEDIAN(CAST(1 AS BIGINT)));
>NUMBER(38,3)[SB2]
SELECT SYSTEM$TYPEOF(CAST(1 AS BIGINT);
>NUMBER(38,0)[SB2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the right type is...the docs says it would return FLOAT or DECIMAL, but when I try on snowflake, I only see NUMBER types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because NUMBER is synonymous to DECIMAL, INT, BIGINT etc but not FLOAT:
snowflake> SELECT
SYSTEM$TYPEOF(MEDIAN(CAST(1 AS FLOAT))) as t1,
SYSTEM$TYPEOF(MEDIAN(CAST(1 AS NUMBER))) as t2,
SYSTEM$TYPEOF(MEDIAN(CAST(1 AS BIGINT))) as t3,
SYSTEM$TYPEOF(MEDIAN(CAST(1 AS NUMBER(10, 5)))) as t4;
T1 | T2 | T3 | T4
-- | -- | -- | --
FLOAT[DOUBLE] | NUMBER(38,3)[SB2] | NUMBER(38,3)[SB2] | NUMBER(13,8)[SB4]
Notice how MEDIAN(FLOAT) -> FLOAT while others return DECIMAL / NUMBER.
Also, from this example we can see that numbers do not preserve their precision.
We should only annotate it with FLOAT (aka DOUBLE) only if the input expr is also one, otherwise we'd annotate it as DECIMAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, @fivetran-felixhuang let's fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this update
def _annotate_median(self: TypeAnnotator, expression: exp.Median) -> exp.Median:
"""Annotate MEDIAN function with correct return type.
Based on Snowflake documentation: "Returns a FLOAT or DECIMAL (fixed-point) number, depending upon the input."
MEDIAN returns:
- FLOAT/DOUBLE inputs -> DOUBLE (preserve FLOAT type)
- Other numeric types (INT, BIGINT) -> DECIMAL
"""
# First annotate the argument to get its type
expression = self._annotate_by_args(expression, "this")
# Get the input type
input_type = expression.this.type
# If input is FLOAT/DOUBLE, return DOUBLE
if input_type and input_type.is_type(exp.DataType.Type.DOUBLE, exp.DataType.Type.FLOAT):
self._set_type(expression, exp.DataType.Type.DOUBLE)
else:
# For all other types (INT, BIGINT, NULL, etc.), return DOUBLE
self._set_type(expression, exp.DataType.Type.DECIMAL)
return expression
and I updated the tests
# dialect: snowflake
MEDIAN(tbl.bigint_col) OVER (PARTITION BY 1);
DECIMAL;
# dialect: snowflake
MEDIAN(CAST(100 AS DECIMAL(10,2)));
DECIMAL;
but I got these errors
======================================================================
FAIL: test_annotate_funcs (tests.test_optimizer.TestOptimizer.test_annotate_funcs) [670, MEDIAN(tbl.bigint_col) OVER (PARTITION BY 1)]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/felix.huang/Fivetran/sqlglot/tests/test_optimizer.py", line 984, in test_annotate_funcs
self.assertEqual(
AssertionError: 'DECIMAL' != 'DECIMAL(38, 0)'
- DECIMAL
+ DECIMAL(38, 0)
======================================================================
FAIL: test_annotate_funcs (tests.test_optimizer.TestOptimizer.test_annotate_funcs) [671, MEDIAN(CAST(100 AS DECIMAL(10,2)))]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/felix.huang/Fivetran/sqlglot/tests/test_optimizer.py", line 984, in test_annotate_funcs
self.assertEqual(
AssertionError: 'DECIMAL' != 'DECIMAL(38, 0)'
- DECIMAL
+ DECIMAL(38, 0)
@VaggelisD @geooo109 Do you know why this happens?
https://docs.snowflake.com/en/sql-reference/functions/median