Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support json_[string][pyarrow] dtype and make pandas-gbq dtypes more independent from google-cloud-bigquery logic #893

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Mar 10, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Towards internal issue 401630655 🦕

@tswast tswast requested review from a team as code owners March 10, 2025 15:26
@tswast tswast requested review from GaoleMeng and GarrettWu March 10, 2025 15:26
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Mar 10, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Mar 10, 2025
@tswast tswast added do not merge Indicates a pull request not ready for merge, due to either quality or timing. and removed api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: l Pull request size is large. labels Mar 10, 2025
@tswast
Copy link
Collaborator Author

tswast commented Mar 10, 2025

Marking as do not merge. Should have been a draft.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Mar 11, 2025
Comment on lines +35 to +43
# Prefer JSON type built-in to pyarrow (adding in 19.0.0), if available.
# Otherwise, fallback to db-dtypes, where the JSONArrowType was added in 1.4.0,
# but since they might have an older db-dtypes, have string as a fallback for that.
if hasattr(pyarrow, "json_"):
json_arrow_type = pyarrow.json_(pyarrow.string())
elif hasattr(db_dtypes, "JSONArrowType"):
json_arrow_type = db_dtypes.JSONArrowType()
else:
json_arrow_type = pyarrow.string()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a change of heart in googleapis/python-bigquery#1876 For to_arrow(), we should emulate the BQ Storage Read API as closely as possible.

For read_gbq(), that's where I'd like to use the extension type(s) if available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants