[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637

heyihong · 2025-12-29T17:11:45Z

What changes were proposed in this pull request?

This PR introduces a new method foreachWithSubqueriesAndPruning in QueryPlan.scala that provides a pruning-enabled variant of foreachWithSubqueries. The method only traverses nodes that match a given condition, improving efficiency. The PR also updates two existing usages:

SparkConnectPlanner - Changed from transformUpWithSubqueriesAndPruning to foreachWithSubqueriesAndPruning since the code was only collecting observations without transforming the plan
ObservationManager - Changed from foreach to foreachWithSubqueriesAndPruning with a condition to only visit nodes containing COLLECT_METRICS pattern

Why are the changes needed?

The changes are needed to:

Provide a more efficient way to traverse query plans when only specific nodes matching certain patterns need to be visited (avoiding unnecessary traversal of irrelevant subtrees)
Optimize observation management in ObservationManager by only traversing nodes that contain COLLECT_METRICS pattern instead of visiting every node

Does this PR introduce any user-facing change?

No. This is an internal optimization that improves performance and code correctness without changing any user-facing behavior or APIs.

How was this patch tested?

build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.plans.QueryPlanSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 2.2.44

cloud-fan · 2026-01-03T14:40:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

+      return
+    }
+    f(this)
+    subqueries.foreach(_.foreachWithSubqueriesAndPruning(cond)(f))


I think foreachWithSubqueries traverse the children first, not subqueries. Shall we follow it here?

or it's actually interleaved in foreachWithSubqueries

The traversal order of foreachWithSubqueries should also be subqueries first, then children, but its traversal implementation is a bit hard to read and understand, in my opinion.

After taking a closer look at the code, I think it may be a style choice for TreeNode methods to handle plan traversal logic (including withPruning), while QueryPlan simply leverages that to perform additional tasks, such as handling subqueries.

So the correctness may not be affected, but style-wise, it makes more sense to move the pruning logic to TreeNode.

Discussed with @cloud-fan offline, while the current convention is to let TreeNode methods to handle plan traversal logic (including withPruning), for foreachWithSubqueriesAndPruning, a clearer implementation should avoid doing so.

cloud-fan · 2026-01-03T14:41:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

+   * Only traverses nodes that match the given condition.
+   */
+  def foreachWithSubqueriesAndPruning(
+      cond: TreePatternBits => Boolean)(f: PlanType => Unit): Unit = {


other pruning methods also have a ruleId parameter, shall we follow?

IMHO, ruleId is not applicable here, since it is used for transformation—it represents the transformation rule ID. Also, ruleId is not used by default, as the default value is UnknownRuleId.

If we really need this parameter, we can extend it later (less is more).

cloud-fan · 2026-01-07T19:09:04Z

thanks, merging to master!

github-actions bot added SQL CONNECT labels Dec 29, 2025

heyihong force-pushed the SPARK-54865 branch 2 times, most recently from 5c1f30e to b0ed386 Compare December 29, 2025 17:20

heyihong changed the title ~~[SPARK-54865] Add foreachWithSubqueriesAndPruning method to QueryPlan~~ [SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan Dec 29, 2025

[SPARK-54865] Add foreachWithSubqueriesAndPruning method to QueryPlan

d18d8b5

heyihong force-pushed the SPARK-54865 branch from b0ed386 to d18d8b5 Compare December 31, 2025 16:12

cloud-fan reviewed Jan 3, 2026

View reviewed changes

heyihong requested a review from cloud-fan January 5, 2026 13:31

foreachWithPruning

ecff57d

heyihong force-pushed the SPARK-54865 branch from aa6923c to ecff57d Compare January 5, 2026 14:32

heyihong mentioned this pull request Jan 5, 2026

[SPARK-54905][SQL] Simplify foreachWithSubqueries implementation in QueryPlan #53681

Closed

heyihong added 2 commits January 7, 2026 14:51

undo foreachWithPruning

02cb32d

back to the initial version

9954cae

cloud-fan approved these changes Jan 7, 2026

View reviewed changes

cloud-fan closed this in 1e8048a Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637

[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637

Uh oh!

heyihong commented Dec 29, 2025 •

edited

Loading

Uh oh!

cloud-fan Jan 3, 2026

Uh oh!

cloud-fan Jan 3, 2026

Uh oh!

heyihong Jan 5, 2026 •

edited

Loading

Uh oh!

heyihong Jan 7, 2026 •

edited

Loading

Uh oh!

cloud-fan Jan 3, 2026

Uh oh!

heyihong Jan 5, 2026 •

edited

Loading

Uh oh!

cloud-fan commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637

[SPARK-54865][CONNECT][SQL] Add foreachWithSubqueriesAndPruning method to QueryPlan #53637

Uh oh!

Conversation

heyihong commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

heyihong Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heyihong Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

heyihong Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heyihong commented Dec 29, 2025 •

edited

Loading

heyihong Jan 5, 2026 •

edited

Loading

heyihong Jan 7, 2026 •

edited

Loading

heyihong Jan 5, 2026 •

edited

Loading