Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

chalmerlowe
Copy link
Collaborator

@chalmerlowe chalmerlowe commented Sep 11, 2024

This adds some additional functionality to properly handle partitioning of columns with the following datatypes.

  • DATE
  • TIMESTAMP
  • DATETIME
  • _PARTITIONDATE

Where appropriate, ensures the following functions can be used with/or without the following TimePartitioningTypes (HOUR, DAY, MONTH, YEAR).

  • DATE_TRUNC()
  • TIMESTAMP_TRUNC()
  • DATETIME_TRUNC()
  • DATE()

This is a nearly complete fix for #1072. The table from #1072 is included here:
NOTE: This PR does not handle _PARTITIONTIME

Column Data Type HOUR DAY MONTH YEAR
DATE N/A Fixed by #1057 via DATE_TRUNC via DATE_TRUNC
DATETIME Incorrectly implemented via DATE_TRUNC TODO: use  DATETIME_TRUNC Incorrectly implemented via DATE_TRUNC TODO: use  DATETIME_TRUNC Incorrectly implemented via DATE_TRUNC TODO: use  DATETIME_TRUNC Incorrectly implemented via DATE_TRUNC TODO: use  DATETIME_TRUNC
TIMESTAMP via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC
_PARTITIONDATE N/A N/A N/A N/A
_PARTITIONTIME Not currently implemented TODO: USE TIMESTAMP_TRUNC Not currently implemented TODO: USE TIMESTAMP_TRUNC Not currently implemented TODO: USE TIMESTAMP_TRUNC Not currently implemented TODO: USE TIMESTAMP_TRUNC

Copy link

conventional-commit-lint-gcf bot commented Sep 11, 2024

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Sep 11, 2024
field = "_PARTITIONDATE"
trunc_fn = "DATE_TRUNC"

# Format used with _PARTITIONDATE which can only be used for
# DAY / MONTH / YEAR
if time_partitioning.field is None and field == "_PARTITIONDATE":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field == "_PARTITIONDATE" is always true

# DAY / MONTH / YEAR
if time_partitioning.field is None and field == "_PARTITIONDATE":
if time_partitioning.type_ in {"DAY", "MONTH", "YEAR"}:
return f"PARTITION BY {trunc_fn}({field})"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, the type isn't passed to the trunc_fn, should it be?

Copy link
Collaborator Author

@chalmerlowe chalmerlowe Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been looking at your comments related to partitioning.

My original code doesn't seem to be 100%, so glad you asked some questions. = )
I am gonna revisit the code and will try to make it correct and simpler, if possible.

Our goal is to increase our coverage of use cases to four.

Within each of those four, the SQL signature will look different depending on the use case and the associated function AND the allowable TimePartitioningType.

Some things to note:

  • _PARTITIONDATE does not allow a function OR a TimePartitioningType at all.
  • _PARTITIONDATE is a pseudocolumn and is in every table (but not normally visible)
  • Some functions only take a couple of TimePartitioningTypes (TPT). See the breakdown below.
# _PARTITIONDATE has no function, no TPT

CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATE ) # has pseudocol: _PARTITIONDATE
PARTITION BY _PARTITIONDATE;

# DATETIME has function and four TPTs

CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATETIME )
DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR); 
 
# TIMESTAMP has function and four TPTs

CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` TIMESTAMP )
TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR);

# DATE has function and only two TPTs

CREATE TABLE `experimental.some_table` ( `id` INT64, `createdAt` DATE )
PARTITION BY DATE_TRUNC(createdAt, MONTH/YEAR);

sqlalchemy_bigquery/base.py Outdated Show resolved Hide resolved
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 11, 2024
@chalmerlowe chalmerlowe added kokoro:run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Nov 13, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. kokoro:run Add this label to force Kokoro to re-run the tests. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants