-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(duckdb): Handle WEEK, WEEK(day), ISOWEEK transpilation for DATE_DIFF & other Date/Time functions #6354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9996310 to
bd2097a
Compare
tests/dialects/test_bigquery.py
Outdated
| self.validate_all( | ||
| "DATE_DIFF('2024-01-15', '2024-01-08', WEEK(MONDAY))", | ||
| write={ | ||
| "bigquery": "DATE_DIFF('2024-01-15', '2024-01-08', ISOWEEK)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it expected that the BigQuery sql gets changed roundtrip WEEK(MONDAY) --> ISOWEEK? can/should it remain the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can preserve WEEK(MONDAY) on bigquery side.
tests/dialects/test_duckdb.py
Outdated
| # All of these should remain as is, they don't have synonyms | ||
| self.validate_identity(f"EXTRACT({part} FROM foo)") | ||
|
|
||
| def test_date_diff_week(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What "interesting" path do these tests cover? It doesn't feel very relevant / useful at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it just covers basic date_diff function tests. Since we are focussing on transpilation, I can remove these tests as there are validate_all tests in bigquery already.
sqlglot/dialects/dialect.py
Outdated
| "THURSDAY": 4, | ||
| "FRIDAY": 5, | ||
| "SATURDAY": 6, | ||
| "SUNDAY": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to the top of the list?
i'm a bit confused, if monday is 1 and sunday is 0, why is sunday last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes sense to move sunday at the top.
| "CENTURIES": "CENTURY", | ||
| } | ||
|
|
||
| WEEK_UNIT_SEMANTICS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do these numbers 0 and 1 mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mapping of week unit names to (start_day, dow_number) tuples, where dow_number is the numeric day-of-week value (0=Sunday, 1=Monday, etc.)
sqlglot/dialects/dialect.py
Outdated
| unit_name = unit.this.upper() if isinstance(unit.this, str) else str(unit.this) | ||
| # Only handle ISOWEEK/WEEKISO variants, not plain WEEK | ||
| if unit_name in ("ISOWEEK", "WEEKISO"): | ||
| week_info = Dialect.WEEK_UNIT_SEMANTICS.get(unit_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you calling the constant here, it can just be WEEK_UNIT_SEMANTICS, or if you want it overriadable, you need to call it on the instance of dialect
sqlglot/dialects/dialect.py
Outdated
| If False, return just day_name string for serialization | ||
| Returns: | ||
| - If include_dow=False: day_name (e.g., "SUNDAY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just always return the day of the week? doesn't it make it simpler?
sqlglot/dialects/duckdb.py
Outdated
| Handles: | ||
| - Var('WEEK') -> ('SUNDAY', 0) # BigQuery default | ||
| - Var('ISOWEEK') -> ('MONDAY', 1) | ||
| - Week(Var('day')) -> ('day', dow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the day is in a column and not a constant string value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When week start day is dynamic (from a column), week_start will be None and we fall back to standard DATE_DIFF, since compile-time offsets would be required.
sqlglot/dialects/duckdb.py
Outdated
| return arg | ||
|
|
||
|
|
||
| def _extract_week_start_day(unit: t.Optional[exp.Expression]) -> t.Optional[t.Tuple[str, int]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point of this function? it doesn't seem necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense, will remove the redundant wrapper and call extract_week_unit_info() directly.
sqlglot/dialects/duckdb.py
Outdated
| """ | ||
| if start_dow == 1: | ||
| # No shift needed for Monday-based weeks (ISO standard) | ||
| return exp.Anonymous(this="DATE_TRUNC", expressions=[exp.Literal.string("week"), date_expr]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you creating an anonymous function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ll replace it with DATE_TRUNC
ea1b9b4 to
592107b
Compare
Issue:
BigQuery's DATE_DIFF with WEEK uses Sunday-start (US convention), while ISOWEEK uses Monday-start (ISO standard). DuckDB only supports 'week' with Monday-start semantics. Direct transpilation produces incorrect results because they count different boundaries.
Before :
After:
Logic:
DATE_TRUNCapproachEdge Case: