Implement JSON type support #330

wudidapaopao · 2025-05-22T01:53:16Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR introduces support for mapping Python objects to ClickHouse JSON type.

Pandas DataFrame: Columns of object type are automatically sampled. If all sampled values are of type dict, the column is mapped to JSON type.
Python Dict: If the first row of a column is a dict, the column is mapped to JSON type.
PyArrow Table: If a column is of struct type in PyArrow, it will be mapped to JSON type.
Custom PyReader: Users can explicitly specify a schema name of JSON for a given column, which will be used accordingly.
Numpy: JSON type is currently not supported for Numpy arrays.
Output Formats: When using output formats such as Arrow, Protobuf, or Parquet, JSON type is temporarily disabled due to ClickHouse limitations.

Additionally, this PR supports SQL queries that involve multiple Python objects within the same query.

The detection of Pandas DataFrame objects is now done via an import-based check, improving compatibility and reliability.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Run these jobs only (required builds will be added automatically):

Deny these jobs:

Extra options:

do not test (only style check)
disable merge-commit (no merge from master before tests)
disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

1
2
3
4

…Frame, dict, and PyReader - Implemented support for querying JSON columns across various data sources including pyarrow Table, DataFrame, dictionary, and PyReader. - Added corresponding test cases to validate the querying functionality for each data type. - Enhanced the display format of run_all.py.

wudidapaopao added 11 commits May 15, 2025 23:48

chore: Add Numpy types

4b4e8a2

chore: Add NumpyType.cpp

1fc1b51

chore: support pandas cache

f885422

chore: add pandas analyzer

6d64f7f

chore: add settings

5fc2ce9

chore: mv TableFunctionPython.cpp StoragePython.cpp PythonSource.cpp

5435e39

chore: mv PythonUtils.cpp

94a51d3

chore: modify CMake

70cf830

chore: modify settings

5b01fe3

fix: fix query with json type

cc5c727

test: add json query test

e529bca

wudidapaopao marked this pull request as draft May 22, 2025 02:01

wudidapaopao added 2 commits May 22, 2025 17:57

fix: fix acquire GIL when process exit

9dd6b67

wudidapaopao changed the title ~~[WIP] Implement JSON type support~~ Implement JSON type support May 26, 2025

wudidapaopao marked this pull request as ready for review May 26, 2025 20:05

wudidapaopao requested a review from auxten May 26, 2025 20:18

wudidapaopao added 3 commits May 27, 2025 17:39

chore: format code

6fc5611

test: added test cases for more data types

8534856

chore: format code

1d0c64a

auxten merged commit ffc395f into chdb-io:main Jun 3, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement JSON type support #330

Implement JSON type support #330

Uh oh!

wudidapaopao commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Implement JSON type support #330

Implement JSON type support #330

Uh oh!

Conversation

wudidapaopao commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Run these jobs only (required builds will be added automatically):

Deny these jobs:

Extra options:

Only specified batches in multi-batch jobs:

Uh oh!

Uh oh!

Uh oh!

wudidapaopao commented May 22, 2025 •

edited

Loading