Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Transform count(distinct pk) into count(pk) #51578

Merged
merged 6 commits into from
Oct 10, 2024

Conversation

danielhumanmod
Copy link
Contributor

@danielhumanmod danielhumanmod commented Oct 2, 2024

Why I'm doing:

In queries where COUNT(DISTINCT primary_key), the DISTINCT operation is redundant since the primary key guarantees uniqueness. We can optimize it by removing DISTINCT on aggregation(like count and sum) on primary key.

What I'm doing:

This change is primarily within the GroupByCountDistinctRewriteRule. Additional logic has been added into this rule to detect cases where an aggregation (such as COUNT or SUM) is performed on a DISTINCT primary key. If the column is a primary key, the DISTINCT operation is automatically removed from the query. This optimization applies only when the aggregation is performed on a single primary key column, ensuring correctness by leveraging the uniqueness guarantee of primary keys.

Fixes #50974

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

private boolean isPrimaryKey(ColumnRefOperator column, LogicalOlapScanOperator scan) {
OlapTable olapTable = (OlapTable) scan.getTable();
if (olapTable.getKeysType() == KeysType.PRIMARY_KEYS) {
List<String> keyColumnNames = olapTable.getKeyColumns()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public List<Column> getKeyColumns() {
    return getColumns().stream().filter(Column::isKey).collect(Collectors.toList());
}

The code is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public List<Column> getKeyColumns() {
    return getColumns().stream().filter(Column::isKey).collect(Collectors.toList());
}

The code is redundant

Thanks for your suggestion, already fixed it in latest commit :)

Copy link

sonarcloud bot commented Oct 8, 2024

Copy link

github-actions bot commented Oct 8, 2024

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Oct 8, 2024

[FE Incremental Coverage Report]

pass : 18 / 19 (94.74%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/optimizer/rule/transformation/GroupByCountDistinctRewriteRule.java 18 19 94.74% [277]

Copy link

github-actions bot commented Oct 8, 2024

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@Seaven Seaven merged commit 48bedcc into StarRocks:main Oct 10, 2024
48 checks passed
ZiheLiu pushed a commit to ZiheLiu/starrocks that referenced this pull request Oct 31, 2024
renzhimin7 pushed a commit to renzhimin7/starrocks that referenced this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transform count(distinct pk) into count(pk)
4 participants