fix: avoid "Unable to determine type" warning with JSON columns in `to_dataframe` #1876

tswast · 2024-03-27T20:44:11Z

TODO:

tests
maybe we don't want string columns for JSON?

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #1580 (TODO: need test case for empty result set)
🦕

tswast · 2024-03-27T21:35:41Z

I might actually want to do something in db-dtypes so that even though it's a string the unboxed version would give a parsed object like the behavior is when the REST API is used.

tswast · 2024-03-27T21:36:00Z

Right now the behavior is inconsistent across REST and BQ Storage API.

tswast · 2024-03-28T15:57:43Z

Marking as do not merge for now, as this makes JSON dtype consistent now but always return string dtype like the BQ Storage Read API code path does, which isn't ideal.

tswast · 2025-03-10T15:44:56Z

Actually, I think this needs a few more tests. I'm testing manually with pytest 'tests/system/test_to_gbq.py::test_dataframe_round_trip_with_table_schema[load_csv-json]' from googleapis/python-bigquery-pandas#893, but it's currently failing because we parse the JSON string in _row_iterator_page_columns, but we actually want to keep those as strings to use the json_ pyarrow type.

tswast · 2025-03-11T21:14:32Z

google/cloud/bigquery/_pyarrow_helpers.py

+    # Prefer JSON type built-in to pyarrow (adding in 19.0.0), if available.
+    # Otherwise, fallback to db-dtypes, where the JSONArrowType was added in 1.4.0,
+    # but since they might have an older db-dtypes, have string as a fallback for that.
+    # TODO(https://github.com/pandas-dev/pandas/issues/60958): switch to
+    # pyarrow.json_(pyarrow.string()) if available and supported by pandas.
+    if hasattr(db_dtypes, "JSONArrowType"):
+        json_arrow_type = db_dtypes.JSONArrowType()
+    else:
+        json_arrow_type = pyarrow.string()


This is the key change. Mostly aligns with bigframes, but we've left off pyarrow.json_(pyarrow.string()) because of pandas-dev/pandas#60958.

tswast · 2025-03-11T21:36:41Z

Marking as do not merge again. I'll split out the refactor into a separate PR first.

Edit: Mailed #2144

…o_dataframe`

product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery API. labels Mar 27, 2024

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 28, 2024

tswast mentioned this pull request Jul 18, 2024

Support JSON data type googleapis/python-bigquery-pandas#698

Open

product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Mar 10, 2025

tswast marked this pull request as ready for review March 10, 2025 15:28

tswast requested review from a team as code owners March 10, 2025 15:28

tswast requested a review from shollyman March 10, 2025 15:28

blunderbuss-gcf bot assigned GaoleMeng Mar 10, 2025

tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 10, 2025

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 10, 2025

product-auto-label bot added size: m Pull request size is medium. size: xl Pull request size is extra large. and removed size: s Pull request size is small. size: m Pull request size is medium. labels Mar 10, 2025

tswast commented Mar 11, 2025

View reviewed changes

tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 11, 2025

tswast requested review from chalmerlowe and Linchin and removed request for shollyman March 11, 2025 21:17

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 11, 2025

tswast mentioned this pull request Mar 11, 2025

chore: refactor cell data parsing to use classes for easier overrides #2144

Open

4 tasks

fix: avoid "Unable to determine type" warning with JSON columns in `t…

2580ef9

…o_dataframe`

tswast force-pushed the tswast-json branch from 8a1aad9 to 2580ef9 Compare March 11, 2025 21:59

tswast changed the base branch from main to tswast-refactor-cell-data March 11, 2025 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid "Unable to determine type" warning with JSON columns in `to_dataframe` #1876

fix: avoid "Unable to determine type" warning with JSON columns in `to_dataframe` #1876

tswast commented Mar 27, 2024 •

edited

Loading

tswast commented Mar 27, 2024

tswast commented Mar 27, 2024

tswast commented Mar 28, 2024

tswast commented Mar 10, 2025

tswast Mar 11, 2025

tswast commented Mar 11, 2025 •

edited

Loading

fix: avoid "Unable to determine type" warning with JSON columns in to_dataframe #1876

Are you sure you want to change the base?

fix: avoid "Unable to determine type" warning with JSON columns in to_dataframe #1876

Conversation

tswast commented Mar 27, 2024 • edited Loading

tswast commented Mar 27, 2024

tswast commented Mar 27, 2024

tswast commented Mar 28, 2024

tswast commented Mar 10, 2025

tswast Mar 11, 2025

Choose a reason for hiding this comment

tswast commented Mar 11, 2025 • edited Loading

fix: avoid "Unable to determine type" warning with JSON columns in `to_dataframe` #1876

fix: avoid "Unable to determine type" warning with JSON columns in `to_dataframe` #1876

tswast commented Mar 27, 2024 •

edited

Loading

tswast commented Mar 11, 2025 •

edited

Loading