Skip to content

Conversation

@fivetran-amrutabhimsenayachit
Copy link
Collaborator

@fivetran-amrutabhimsenayachit fivetran-amrutabhimsenayachit commented Nov 19, 2025

Issue:
BigQuery's DATE_DIFF with WEEK uses Sunday-start (US convention), while ISOWEEK uses Monday-start (ISO standard). DuckDB only supports 'week' with Monday-start semantics. Direct transpilation produces incorrect results because they count different boundaries.

Before :

Bigquery:
bq --project_id fivetran-wild-west query --use_legacy_sql=false "SELECT DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK) AS week_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff"
+-----------+-------------------+--------------+
| week_diff | week_weekday_diff | isoweek_diff |
+-----------+-------------------+--------------+
|         0 |                 1 |            1 |
+-----------+-------------------+--------------+

Transpilation:
python3 -c "import sqlglot; print(sqlglot.transpile(\"SELECT DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK) AS week_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff\", read='bigquery', write='duckdb')[0])"
-->
SELECT DATE_DIFF('WEEK', CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS week_diff, DATE_DIFF(WEEK(MONDAY), CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS week_weekday_diff, DATE_DIFF('ISOWEEK', CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS isoweek_diff


Duckdb:
duckdb -c "SELECT DATE_DIFF('WEEK', CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS week_diff, DATE_DIFF(WEEK(MONDAY), CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS week_weekday_diff, DATE_DIFF('ISOWEEK', CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS DATE)) AS isoweek_diff"
Binder Error:
Referenced column "MONDAY" not found in FROM clause!

LINE 1: ...), CAST('2017-12-18' AS DATE)) AS week_diff, DATE_DIFF(WEEK(MONDAY), CAST('2017-12-17' AS DATE), CAST('2017-12-18' AS...

After:

Bigquery:
bq --project_id fivetran-wild-west query --use_legacy_sql=false "SELECT DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK) AS week_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff"
+-----------+-------------------+--------------+
| week_diff | week_weekday_diff | isoweek_diff |
+-----------+-------------------+--------------+
|         0 |                 1 |            1 |
+-----------+-------------------+--------------+

Transpilation:
sqlglot % python3 -c "import sqlglot; print(sqlglot.transpile(\"SELECT DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK) AS week_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, DATE_DIFF(DATE '2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff\", read='bigquery', write='duckdb')[0])"
-->
SELECT DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE) + INTERVAL '1' DAY), DATE_TRUNC('week', CAST('2017-12-18' AS DATE) + INTERVAL '1' DAY)) // 7 AS week_diff, DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE)), DATE_TRUNC('week', CAST('2017-12-18' AS DATE))) // 7 AS week_weekday_diff, DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE)), DATE_TRUNC('week', CAST('2017-12-18' AS DATE))) // 7 AS isoweek_diff

DuckdB:
sqlglot % duckdb -c "SELECT DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE) + INTERVAL '1' DAY), DATE_TRUNC('week', CAST('2017-12-18' AS DATE) + INTERVAL '1' DAY)) // 7 AS week_diff, DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE)), DATE_TRUNC('week', CAST('2017-12-18' AS DATE))) // 7 AS week_weekday_diff, DATE_DIFF('DAY', DATE_TRUNC('week', CAST('2017-12-17' AS DATE)), DATE_TRUNC('week', CAST('2017-12-18' AS DATE))) // 7 AS isoweek_diff"
┌───────────┬───────────────────┬──────────────┐
│ week_diff │ week_weekday_diff │ isoweek_diff │
│   int64   │       int64       │    int64     │
├───────────┼───────────────────┼──────────────┤
│     0     │         1         │      1       │
└───────────┴───────────────────┴──────────────┘

Logic:

  1. Apply Normalization:
WeekStart(SUNDAY) → "WEEK"
WeekStart(MONDAY) → "ISOWEEK"
WeekStart(other) → "WEEK(day)"
  1. Convert canonical WeekStart to DuckDB's DATE_TRUNC approach
  • Calculate the shift
Sunday (day 0):    shift = 1 - 0 = +1   (shift forward 1 day)
Monday (day 1):    shift = 1 - 1 = 0    (no shift needed!)
Tuesday (day 2):   shift = 1 - 2 = -1   (shift backward 1 day)
Wednesday (day 3): shift = 1 - 3 = -2   (shift backward 2 days)
Thursday (day 4):  shift = 1 - 4 = -3   (shift backward 3 days)
Friday (day 5):    shift = 1 - 5 = -4   (shift backward 4 days)
Saturday (day 6):  shift = 1 - 6 = -5   (shift backward 5 days)

  • Shift the dates
Eg: Jan 6 - 5 days = Jan 1 (Monday)
      Jan 13 - 5 days = Jan 8 (Monday)
  • Apply DATE_TRUNC
DATE_TRUNC('week', Jan 1) = Jan 1 (Monday)
DATE_TRUNC('week', Jan 8) = Jan 8 (Monday)

  • Calculate DATE_DIFF
DATE_DIFF('day', Jan 1, Jan 8) = 7 days
7 / 7 = 1 week 

Edge Case:

` bq query --nouse_legacy_sql "SELECT DATE_DIFF(DATE '2024-01-01', DATE '2024-01-15', WEEK(SUNDAY)) as Weeks"                                                       
+-------+
| Weeks |
+-------+
|    -2 |
+-------+
sqlglot % python -c "import sqlglot; print(sqlglot.transpile(\"SELECT DATE_DIFF(DATE '2024-01-01', DATE '2024-01-15', WEEK(SUNDAY)) as Weeks\", read='bigquery', write='duckdb')[0])"

SELECT DATE_DIFF('DAY', DATE_TRUNC('WEEK', CAST('2024-01-15' AS DATE) + INTERVAL '1' DAY), DATE_TRUNC('WEEK', CAST('2024-01-01' AS DATE) + INTERVAL '1' DAY)) // 7 AS Weeks
 sqlglot % duckdb -c "SELECT DATE_DIFF('DAY', DATE_TRUNC('WEEK', CAST('2024-01-15' AS DATE) + INTERVAL '1' DAY), DATE_TRUNC('WEEK', CAST('2024-01-01' AS DATE) + INTERVAL '1' DAY)) // 7 AS Weeks"
┌───────┐
│ Weeks │
│ int64 │
├───────┤
│  -2   │
└───────┘`

@fivetran-amrutabhimsenayachit fivetran-amrutabhimsenayachit force-pushed the RD-1050264-transpile-big-querys-date-diff-date-function-to-duck-db branch from 9996310 to bd2097a Compare November 20, 2025 18:08
@fivetran-amrutabhimsenayachit fivetran-amrutabhimsenayachit changed the title feat(duckdb): Handle WEEK, WEEK(day), ISOWEEK transpilation for DATE_DIFF function feat(duckdb): Handle WEEK, WEEK(day), ISOWEEK transpilation for DATE_DIFF & other Date/Time functions Nov 20, 2025
self.validate_all(
"DATE_DIFF('2024-01-15', '2024-01-08', WEEK(MONDAY))",
write={
"bigquery": "DATE_DIFF('2024-01-15', '2024-01-08', ISOWEEK)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected that the BigQuery sql gets changed roundtrip WEEK(MONDAY) --> ISOWEEK? can/should it remain the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can preserve WEEK(MONDAY) on bigquery side.

# All of these should remain as is, they don't have synonyms
self.validate_identity(f"EXTRACT({part} FROM foo)")

def test_date_diff_week(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "interesting" path do these tests cover? It doesn't feel very relevant / useful at first glance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it just covers basic date_diff function tests. Since we are focussing on transpilation, I can remove these tests as there are validate_all tests in bigquery already.

"THURSDAY": 4,
"FRIDAY": 5,
"SATURDAY": 6,
"SUNDAY": 0,
Copy link
Owner

@tobymao tobymao Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to the top of the list?

i'm a bit confused, if monday is 1 and sunday is 0, why is sunday last?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes sense to move sunday at the top.

"CENTURIES": "CENTURY",
}

WEEK_UNIT_SEMANTICS = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these numbers 0 and 1 mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mapping of week unit names to (start_day, dow_number) tuples, where dow_number is the numeric day-of-week value (0=Sunday, 1=Monday, etc.)

unit_name = unit.this.upper() if isinstance(unit.this, str) else str(unit.this)
# Only handle ISOWEEK/WEEKISO variants, not plain WEEK
if unit_name in ("ISOWEEK", "WEEKISO"):
week_info = Dialect.WEEK_UNIT_SEMANTICS.get(unit_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you calling the constant here, it can just be WEEK_UNIT_SEMANTICS, or if you want it overriadable, you need to call it on the instance of dialect

If False, return just day_name string for serialization
Returns:
- If include_dow=False: day_name (e.g., "SUNDAY")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just always return the day of the week? doesn't it make it simpler?

Handles:
- Var('WEEK') -> ('SUNDAY', 0) # BigQuery default
- Var('ISOWEEK') -> ('MONDAY', 1)
- Week(Var('day')) -> ('day', dow)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the day is in a column and not a constant string value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When week start day is dynamic (from a column), week_start will be None and we fall back to standard DATE_DIFF, since compile-time offsets would be required.

return arg


def _extract_week_start_day(unit: t.Optional[exp.Expression]) -> t.Optional[t.Tuple[str, int]]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of this function? it doesn't seem necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, will remove the redundant wrapper and call extract_week_unit_info() directly.

"""
if start_dow == 1:
# No shift needed for Monday-based weeks (ISO standard)
return exp.Anonymous(this="DATE_TRUNC", expressions=[exp.Literal.string("week"), date_expr])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you creating an anonymous function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ll replace it with DATE_TRUNC

@fivetran-amrutabhimsenayachit fivetran-amrutabhimsenayachit force-pushed the RD-1050264-transpile-big-querys-date-diff-date-function-to-duck-db branch from ea1b9b4 to 592107b Compare November 26, 2025 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants