-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113
base: main
Are you sure you want to change the base?
Conversation
🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use -- conventional-commit-lint bot |
sqlalchemy_bigquery/base.py
Outdated
field = "_PARTITIONDATE" | ||
trunc_fn = "DATE_TRUNC" | ||
|
||
# Format used with _PARTITIONDATE which can only be used for | ||
# DAY / MONTH / YEAR | ||
if time_partitioning.field is None and field == "_PARTITIONDATE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field == "_PARTITIONDATE"
is always true
sqlalchemy_bigquery/base.py
Outdated
# DAY / MONTH / YEAR | ||
if time_partitioning.field is None and field == "_PARTITIONDATE": | ||
if time_partitioning.type_ in {"DAY", "MONTH", "YEAR"}: | ||
return f"PARTITION BY {trunc_fn}({field})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused, the type isn't passed to the trunc_fn, should it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been looking at your comments related to partitioning.
My original code doesn't seem to be 100%, so glad you asked some questions. = )
I am gonna revisit the code and will try to make it correct and simpler, if possible.
Our goal is to increase our coverage of use cases to four.
Within each of those four, the SQL signature will look different depending on the use case and the associated function AND the allowable TimePartitioningType
.
Some things to note:
- _PARTITIONDATE does not allow a function OR a
TimePartitioningType
at all. - _PARTITIONDATE is a pseudocolumn and is in every table (but not normally visible)
- Some functions only take a couple of
TimePartitioningTypes
(TPT). See the breakdown below.
# _PARTITIONDATE has no function, no TPT
CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATE ) # has pseudocol: _PARTITIONDATE
PARTITION BY _PARTITIONDATE;
# DATETIME has function and four TPTs
CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATETIME )
DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR);
# TIMESTAMP has function and four TPTs
CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` TIMESTAMP )
TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR);
# DATE has function and only two TPTs
CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATE )
PARTITION BY DATE_TRUNC(createdAt, MONTH/YEAR);
This adds some additional functionality to properly handle partitioning of columns with the following datatypes.
Where appropriate, ensures the following functions can be used with/or without the following TimePartitioningTypes (HOUR, DAY, MONTH, YEAR).
This is a nearly complete fix for #1072. The table from #1072 is included here:
NOTE: This PR does not handle _PARTITIONTIME