-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5c1f30e to
b0ed386
Compare
b0ed386 to
d18d8b5
Compare
| return | ||
| } | ||
| f(this) | ||
| subqueries.foreach(_.foreachWithSubqueriesAndPruning(cond)(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think foreachWithSubqueries traverse the children first, not subqueries. Shall we follow it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or it's actually interleaved in foreachWithSubqueries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The traversal order of foreachWithSubqueries should also be subqueries first, then children, but its traversal implementation is a bit hard to read and understand, in my opinion.
After taking a closer look at the code, I think it may be a style choice for TreeNode methods to handle plan traversal logic (including withPruning), while QueryPlan simply leverages that to perform additional tasks, such as handling subqueries.
So the correctness may not be affected, but style-wise, it makes more sense to move the pruning logic to TreeNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @cloud-fan offline, while the current convention is to let TreeNode methods to handle plan traversal logic (including withPruning), for foreachWithSubqueriesAndPruning, a clearer implementation should avoid doing so.
| * Only traverses nodes that match the given condition. | ||
| */ | ||
| def foreachWithSubqueriesAndPruning( | ||
| cond: TreePatternBits => Boolean)(f: PlanType => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other pruning methods also have a ruleId parameter, shall we follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, ruleId is not applicable here, since it is used for transformation—it represents the transformation rule ID. Also, ruleId is not used by default, as the default value is UnknownRuleId.
If we really need this parameter, we can extend it later (less is more).
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR introduces a new method foreachWithSubqueriesAndPruning in QueryPlan.scala that provides a pruning-enabled variant of foreachWithSubqueries. The method only traverses nodes that match a given condition, improving efficiency. The PR also updates two existing usages:
Why are the changes needed?
The changes are needed to:
Does this PR introduce any user-facing change?
No. This is an internal optimization that improves performance and code correctness without changing any user-facing behavior or APIs.
How was this patch tested?
build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.plans.QueryPlanSuite"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 2.2.44