Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement][FlatJson] Improve flat json performace and extract strategy (backport #50696) #51215

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Sep 20, 2024

Why I'm doing:

Performance:

  • Forbidden push down json/map expression to storage, it's bad performance in most scenarios
  • Improve read remain data performance
    • add bloom filter with json subfield keys, will check path when read remain

FlatJson Extract Strategy:

  • Improve flat json extract strategy:
    • Before: only extract leaf node
    • Now: try to extract Non-leaf node when leaf node don't meet required, will check Non-leaf node too
    • rewirte _finalize method by bottom-up dfs: for support check non-leaf node and extract it
  • support extract flat json when json is subfiled in array/struct/map

Porfile Enhancement:

  • support flat_json_meta on primary key table
  • support flat_json_meta on array/struct/map column
  • merge all flat json subfield when query whole json

BugFixs:

  • Fix json path with . bug: will effect extract subfield name, don't extract when name contains .
  • Fix flat json compaction bug: will lose some json subfield when some data with remain and some data without remain
  • Fix flat json porfile lose in share-data mode
  • Fix flat json crash use chunk accumulator

What I'm doing:

Fixes https://github.com/StarRocks/StarRocksTest/issues/8533 https://github.com/StarRocks/StarRocksTest/issues/8534 https://github.com/StarRocks/StarRocksTest/issues/8536 https://github.com/StarRocks/StarRocksTest/issues/8568

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

This is an automatic backport of pull request #50696 done by [Mergify](https://mergify.com). ## Why I'm doing:

Performance:

  • Forbidden push down json/map expression to storage, it's bad performance in most scenarios
  • Improve read remain data performance
    • add bloom filter with json subfield keys, will check path when read remain

FlatJson Extract Strategy:

  • Improve flat json extract strategy:
    • Before: only extract leaf node
    • Now: try to extract Non-leaf node when leaf node don't meet required, will check Non-leaf node too
    • rewirte _finalize method by bottom-up dfs: for support check non-leaf node and extract it
  • support extract flat json when json is subfiled in array/struct/map

Porfile Enhancement:

  • support flat_json_meta on primary key table
  • support flat_json_meta on array/struct/map column
  • merge all flat json subfield when query whole json

BugFixs:

  • Fix json path with . bug: will effect extract subfield name, don't extract when name contains .
  • Fix flat json compaction bug: will lose some json subfield when some data with remain and some data without remain
  • Fix flat json porfile lose in share-data mode
  • Fix flat json crash use chunk accumulator

What I'm doing:

Fixes https://github.com/StarRocks/StarRocksTest/issues/8533 https://github.com/StarRocks/StarRocksTest/issues/8534 https://github.com/StarRocks/StarRocksTest/issues/8536 https://github.com/StarRocks/StarRocksTest/issues/8568

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

@mergify mergify bot added the conflicts label Sep 20, 2024
Copy link
Contributor Author

mergify bot commented Sep 20, 2024

Cherry-pick of 45d72ac has failed:

On branch mergify/bp/branch-3.3/pr-50696
Your branch is up to date with 'origin/branch-3.3'.

You are currently cherry-picking commit 45d72ace19.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/column/column_access_path.cpp
	modified:   be/src/column/json_column.cpp
	modified:   be/src/column/json_column.h
	modified:   be/src/common/config.h
	modified:   be/src/connector/lake_connector.cpp
	modified:   be/src/exprs/json_functions.cpp
	modified:   be/src/storage/chunk_helper.cpp
	modified:   be/src/storage/chunk_helper.h
	modified:   be/src/storage/meta_reader.cpp
	modified:   be/src/storage/olap_meta_reader.cpp
	modified:   be/src/storage/rowset/array_column_writer.cpp
	modified:   be/src/storage/rowset/bloom_filter.h
	modified:   be/src/storage/rowset/column_reader.cpp
	modified:   be/src/storage/rowset/column_reader.h
	modified:   be/src/storage/rowset/json_column_compactor.cpp
	modified:   be/src/storage/rowset/json_column_iterator.cpp
	modified:   be/src/storage/rowset/json_column_writer.cpp
	modified:   be/src/storage/rowset/json_column_writer.h
	modified:   be/src/storage/rowset/map_column_writer.cpp
	modified:   be/src/storage/rowset/struct_column_writer.cpp
	modified:   be/src/types/logical_type.h
	modified:   be/src/util/json_flattener.cpp
	modified:   be/src/util/json_flattener.h
	modified:   be/test/exprs/flat_json_functions_test.cpp
	modified:   be/test/storage/rowset/flat_json_column_compact_test.cpp
	modified:   be/test/storage/rowset/flat_json_column_rw_test.cpp
	modified:   be/test/util/json_flattener_test.cpp
	modified:   fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java
	modified:   gensrc/proto/segment.proto

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/src/exec/olap_scan_prepare.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link
Contributor Author

mergify bot commented Sep 20, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@Seaven Seaven reopened this Sep 20, 2024
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) September 20, 2024 07:30
@Seaven Seaven force-pushed the mergify/bp/branch-3.3/pr-50696 branch from 5014e52 to 5e4b11e Compare September 20, 2024 07:33
Copy link

sonarcloud bot commented Sep 20, 2024

@wanpengfei-git wanpengfei-git merged commit 203310f into branch-3.3 Sep 20, 2024
29 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.3/pr-50696 branch September 20, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants