-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Transform count(distinct pk) into count(pk) #51578
Conversation
private boolean isPrimaryKey(ColumnRefOperator column, LogicalOlapScanOperator scan) { | ||
OlapTable olapTable = (OlapTable) scan.getTable(); | ||
if (olapTable.getKeysType() == KeysType.PRIMARY_KEYS) { | ||
List<String> keyColumnNames = olapTable.getKeyColumns() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public List<Column> getKeyColumns() {
return getColumns().stream().filter(Column::isKey).collect(Collectors.toList());
}
The code is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public List<Column> getKeyColumns() { return getColumns().stream().filter(Column::isKey).collect(Collectors.toList()); } The code is redundant
Thanks for your suggestion, already fixed it in latest commit :)
Signed-off-by: Daniel Tu <[email protected]>
Signed-off-by: Daniel Tu <[email protected]>
Signed-off-by: Daniel Tu <[email protected]>
Signed-off-by: Daniel Tu <[email protected]>
Signed-off-by: Daniel Tu <[email protected]>
Signed-off-by: Daniel Tu <[email protected]>
ff0548e
to
2e71037
Compare
Quality Gate passedIssues Measures |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 18 / 19 (94.74%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
…51578) Signed-off-by: Daniel Tu <[email protected]>
…51578) Signed-off-by: Daniel Tu <[email protected]> Signed-off-by: zhiminr.ren <[email protected]>
Why I'm doing:
In queries where COUNT(DISTINCT primary_key), the DISTINCT operation is redundant since the primary key guarantees uniqueness. We can optimize it by removing DISTINCT on aggregation(like count and sum) on primary key.
What I'm doing:
This change is primarily within the
GroupByCountDistinctRewriteRule
. Additional logic has been added into this rule to detect cases where an aggregation (such as COUNT or SUM) is performed on a DISTINCT primary key. If the column is a primary key, the DISTINCT operation is automatically removed from the query. This optimization applies only when the aggregation is performed on a single primary key column, ensuring correctness by leveraging the uniqueness guarantee of primary keys.Fixes #50974
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: