Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query): fix left semi optimize to inner join #17458

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Feb 14, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. fix left semi optimize to inner join which fix tpcds Q38
  2. more stddev_samp return null instead of 0 when only one input record which fix tpcds Q39

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Feb 14, 2025
Copy link

what-the-diff bot commented Feb 14, 2025

PR Summary

  • Improved error handling in 'clamp_date' function
    The approach for managing errors in the 'clamp_date' function within 'date.rs' has been simplified by removing logging error handling for dates that fall outside of the acceptable range.
  • Enhanced 'state_merge' logic
    The logic within 'aggregate_stddev.rs' has been made easier to read and understand by modifying the 'state_merge' function to include checks for 'other.count > 0' before merging states.
  • Amended result formation in 'state_merge_result'
    The 'state_merge_result' now uses 'NullableColumnBuilder', which helps manage situations when counts are less than or equal to 1 by permitting null results.
  • Optimized return type of 'try_create_aggregate_stddev_pop_function'
    By changing 'try_create_aggregate_stddev_pop_function' to wrap the result in a nullable type, null handling has become more efficient.
  • Refined extraction of group-by keys
    In 'rule_semi_to_inner_join.rs', the technique for extracting group-by keys has been changed - a 'HashMap' is now used instead of a 'HashSet'. This allows type information to be encapsulated alongside the data.
  • Updated SQL logic tests
    The expected output for SQL logic tests related to aggregate functions now aligns with the new behavior for null handling.
  • Streamlined DuckDB client initialization
    The initialization of the DuckDB client in Python test scripts now uses an environment variable to set the port configuration. This increases flexibility and is more adaptable to different testing environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant