-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add new agg/window function 'approx_top_k' #29643
Conversation
11f6d4a
to
affe7e0
Compare
a086a12
to
21b567c
Compare
fe/fe-core/src/main/java/com/starrocks/sql/analyzer/DecimalV3FunctionAnalyzer.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/sql/analyzer/FunctionAnalyzer.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/test/java/com/starrocks/sql/plan/AggregateTest.java
Outdated
Show resolved
Hide resolved
0042d56
to
baa4c5e
Compare
f8cad83
to
29ad60a
Compare
4ce7e38
to
c6a14fe
Compare
Signed-off-by: liuyehcf <[email protected]>
c6a14fe
to
665e5b2
Compare
struct ApproxTopKState { | ||
using CppType = RunTimeCppType<LT>; | ||
using ColumnType = RunTimeColumnType<LT>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we imp a general version for complex types? like array, map, struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beyond the scope of this pr. May be we will support semi-structured type.
Signed-off-by: liuyehcf <[email protected]>
Signed-off-by: liuyehcf <[email protected]>
SonarCloud Quality Gate failed. 2 Bugs 0.0% Coverage Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
[FE Incremental Coverage Report]😍 pass : 95 / 98 (96.94%) file detail
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[BE Incremental Coverage Report]😞 fail : 5 / 273 (01.83%) file detail
|
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-3.1 |
✅ Backports have been created
|
✅ Backports have been created
|
* [Feature] Add new window function 'approx_top_k' Signed-off-by: liuyehcf <[email protected]> * update 1 Signed-off-by: liuyehcf <[email protected]> * update 2 Signed-off-by: liuyehcf <[email protected]> * update 3 Signed-off-by: liuyehcf <[email protected]> * update 4 Signed-off-by: liuyehcf <[email protected]> * update 5 Signed-off-by: liuyehcf <[email protected]> * update 6 Signed-off-by: liuyehcf <[email protected]> --------- Signed-off-by: liuyehcf <[email protected]> (cherry picked from commit 43968b7) # Conflicts: # be/src/exprs/agg/factory/aggregate_factory.hpp # fe/fe-core/src/main/java/com/starrocks/analysis/AnalyticExpr.java # fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java # test/common/sql/ssb/create.sql # test/common/sql/tpcds/create.sql # test/common/sql/tpch/create.sql
* [Feature] Add new window function 'approx_top_k' Signed-off-by: liuyehcf <[email protected]> * update 1 Signed-off-by: liuyehcf <[email protected]> * update 2 Signed-off-by: liuyehcf <[email protected]> * update 3 Signed-off-by: liuyehcf <[email protected]> * update 4 Signed-off-by: liuyehcf <[email protected]> * update 5 Signed-off-by: liuyehcf <[email protected]> * update 6 Signed-off-by: liuyehcf <[email protected]> --------- Signed-off-by: liuyehcf <[email protected]> (cherry picked from commit 43968b7) # Conflicts: # be/src/exprs/agg/factory/aggregate_factory.hpp # fe/fe-core/src/main/java/com/starrocks/analysis/AnalyticExpr.java # fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java # fe/fe-core/src/test/java/com/starrocks/sql/plan/AggregateTest.java # test/common/sql/ssb/create.sql # test/common/sql/tpcds/create.sql # test/common/sql/tpch/create.sql
#30357) * [Feature] Add new agg/window function 'approx_top_k' (#29643) * [Feature] Add new window function 'approx_top_k' Signed-off-by: liuyehcf <[email protected]> * update 1 Signed-off-by: liuyehcf <[email protected]> * update 2 Signed-off-by: liuyehcf <[email protected]> * update 3 Signed-off-by: liuyehcf <[email protected]> * update 4 Signed-off-by: liuyehcf <[email protected]> * update 5 Signed-off-by: liuyehcf <[email protected]> * update 6 Signed-off-by: liuyehcf <[email protected]> --------- Signed-off-by: liuyehcf <[email protected]> (cherry picked from commit 43968b7) # Conflicts: # be/src/exprs/agg/factory/aggregate_factory.hpp # fe/fe-core/src/main/java/com/starrocks/analysis/AnalyticExpr.java # fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java # fe/fe-core/src/test/java/com/starrocks/sql/plan/AggregateTest.java # test/common/sql/ssb/create.sql # test/common/sql/tpcds/create.sql # test/common/sql/tpch/create.sql * solve conflict Signed-off-by: liuyehcf <[email protected]> * fix fe ut --------- Signed-off-by: liuyehcf <[email protected]> Co-authored-by: liuyehcf <[email protected]>
#30356) * [Feature] Add new agg/window function 'approx_top_k' (#29643) * [Feature] Add new window function 'approx_top_k' Signed-off-by: liuyehcf <[email protected]> * update 1 Signed-off-by: liuyehcf <[email protected]> * update 2 Signed-off-by: liuyehcf <[email protected]> * update 3 Signed-off-by: liuyehcf <[email protected]> * update 4 Signed-off-by: liuyehcf <[email protected]> * update 5 Signed-off-by: liuyehcf <[email protected]> * update 6 Signed-off-by: liuyehcf <[email protected]> --------- Signed-off-by: liuyehcf <[email protected]> (cherry picked from commit 43968b7) # Conflicts: # be/src/exprs/agg/factory/aggregate_factory.hpp # fe/fe-core/src/main/java/com/starrocks/analysis/AnalyticExpr.java # fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java # test/common/sql/ssb/create.sql # test/common/sql/tpcds/create.sql # test/common/sql/tpch/create.sql * solve conflict Signed-off-by: liuyehcf <[email protected]> * fix fe ut * fix be compile --------- Signed-off-by: liuyehcf <[email protected]> Co-authored-by: liuyehcf <[email protected]>
* [Feature] Add new window function 'approx_top_k' Signed-off-by: liuyehcf <[email protected]> * update 1 Signed-off-by: liuyehcf <[email protected]> * update 2 Signed-off-by: liuyehcf <[email protected]> * update 3 Signed-off-by: liuyehcf <[email protected]> * update 4 Signed-off-by: liuyehcf <[email protected]> * update 5 Signed-off-by: liuyehcf <[email protected]> * update 6 Signed-off-by: liuyehcf <[email protected]> --------- Signed-off-by: liuyehcf <[email protected]>
Fix #25684
Space Save Algorithm
The Space Saving Algorithm is commonly used for estimating the top-K frequent items in a stream of data with limited memory. To implement this as a two-stage aggregate function for a distributed database management system (DBMS), you'll need to handle the aggregation in two main phases:
Local Aggregation
(First-stage aggregate on each node)Global Aggregation
(Second-stage aggregate on a single node)Here's how you can design and execute the two-stage aggregation:
Local Aggregation (First-stage): Each node will maintain a list of counters based on the Space Saving Algorithm:
Global Aggregation (Second-stage): After the local aggregation phase, the intermediate counters from all nodes will be sent to a particular aggregation node. On this node:
Description
Please refer to
approx_top_k.md
in the change list for more information.Examples
Limitations
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: