Skip to content

Conversation

@askalt
Copy link
Contributor

@askalt askalt commented Jan 13, 2026

This patch aims to implement a fast-path for the ExecutionPlan::with_new_children function for some plans, moving closer to a physical plan re-use implementation and improving planning performance. If the passed children properties are the same as in self, we do not actually recompute self's properties (which could be costly if projection mapping is required). Instead, we just replace the children and re-use self's properties as-is.

To be able to compare two different properties -- ExecutionPlan::properties(...) signature is modified and now returns &Arc<PlanProperties>. If children properties are the same in with_new_children -- we clone our properties arc and then a parent plan will consider our properties as unchanged, doing the same.

  • Return &Arc<PlanProperties> from ExecutionPlan::properties(...) instead of a reference.
  • Implement with_new_children fast-path if there is no children properties changes for all
    major plans.

Note: currently, reset_plan_states does not allow to re-use plan in general: it is not
supported for dynamic filters and recursive queries features, as in this case state reset
should update pointers in the children plans.

@github-actions github-actions bot added documentation Improvements or additions to documentation physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate catalog Related to the catalog crate common Related to common crate proto Related to proto crate datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Jan 13, 2026
@askalt askalt changed the title add fast-path for with_new_children Draft: add fast-path for with_new_children Jan 13, 2026
@askalt askalt marked this pull request as draft January 13, 2026 14:26
@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from 72ff575 to 796f731 Compare January 13, 2026 15:20
@askalt
Copy link
Contributor Author

askalt commented Jan 13, 2026

Also added a typical analytical query plan re-usage benchmark. On the main branch it runs ~4-5 ms when at this MR it spends ~100us.

$ cargo bench  --profile=release-nonlto   --bench plan_reuse

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from 796f731 to 5601c4f Compare January 13, 2026 15:24
@alamb
Copy link
Contributor

alamb commented Jan 13, 2026

I filed a ticket to track this idea

@alamb
Copy link
Contributor

alamb commented Jan 13, 2026

run benchmark sql_planner

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing askalt/with_new_children_fast_path (5601c4f) to 4c67d02 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=askalt_with_new_children_fast_path
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jan 13, 2026

Also added a typical analytical query plan re-usage benchmark. On the main branch it runs ~4-5 ms when at this MR it spends ~100us.

$ cargo bench  --profile=release-nonlto   --bench plan_reuse

Could you move the plan_reuse benchmark into its own PR (as I think it is valuable both for this PR and others, and it makes it easier to automatically compare performance)

@alamb-ghbot
Copy link

Benchmark script failed with exit code 101.

Last 10 lines of output:

Click to expand
                        time:   [5.6457 ms 5.6877 ms 5.7286 ms]

Benchmarking physical_plan_clickbench_q50
Benchmarking physical_plan_clickbench_q50: Warming up for 3.0000 s

thread 'main' (3793320) panicked at datafusion/core/benches/sql_planner.rs:62:14:
called `Result::unwrap()` on an `Err` value: Context("type_coercion", Internal("Expect TypeSignatureClass::Native(LogicalType(Native(String), String)) but received NativeType::Binary, DataType: BinaryView"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: bench failed, to rerun pass `-p datafusion --bench sql_planner`

@askalt
Copy link
Contributor Author

askalt commented Jan 14, 2026

Also added a typical analytical query plan re-usage benchmark. On the main branch it runs ~4-5 ms when at this MR it spends ~100us.

$ cargo bench  --profile=release-nonlto   --bench plan_reuse

Could you move the plan_reuse benchmark into its own PR (as I think it is valuable both for this PR and others, and it makes it easier to automatically compare performance)

Done in #19806

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from 5601c4f to 99cf634 Compare January 14, 2026 10:32
@askalt
Copy link
Contributor Author

askalt commented Jan 14, 2026

Benchmark script failed with exit code 101.

Last 10 lines of output:

Click to expand

The same panic on 4c67d02

Created an issue for it: #19809

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from 99cf634 to b81dd66 Compare January 14, 2026 14:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @askalt -- this is quite clever and I think it looks very promising

I also think we may be able to potentially make with_new_children even faster by checking the children as well -- and if they are the same there is no reason to recompute everything either.

However, this likely won't help your usecase as the children will likely change (their states need to be reset) 🤔

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from b81dd66 to 741b085 Compare January 15, 2026 11:48
@askalt askalt marked this pull request as ready for review January 15, 2026 11:49
@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

This PR seems to have accumulated some conflicts

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

I plan to run the newly introduced benchmark in #19806

I suspect we'll see quite a nice improvement

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from 741b085 to 21b8a3f Compare January 15, 2026 18:22
@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

run benchmark reset_plan_states

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing askalt/with_new_children_fast_path (21b8a3f) to 094e7ee diff
BENCH_NAME=reset_plan_states
BENCH_COMMAND=cargo bench --features=parquet --bench reset_plan_states
BENCH_FILTER=
BENCH_BRANCH_NAME=askalt_with_new_children_fast_path
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group     askalt_with_new_children_fast_path     main
-----     ----------------------------------     ----
query1    1.00      2.6±0.01µs        ? ?/sec    14935.96    39.4±0.71ms        ? ?/sec
query2    1.00      3.3±0.06µs        ? ?/sec    3184.28    10.4±0.14ms        ? ?/sec
query3    1.00    980.0±9.64ns        ? ?/sec    15606.83    15.3±0.41ms        ? ?/sec

@alamb alamb mentioned this pull request Jan 15, 2026
@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

group     askalt_with_new_children_fast_path     main
-----     ----------------------------------     ----
query1    1.00      2.6±0.01µs        ? ?/sec    14935.96    39.4±0.71ms        ? ?/sec
query2    1.00      3.3±0.06µs        ? ?/sec    3184.28    10.4±0.14ms        ? ?/sec
query3    1.00    980.0±9.64ns        ? ?/sec    15606.83    15.3±0.41ms        ? ?/sec

That is pretty amazing (3000x-15000x performance improvement 👍 )

I am also testing sql_planner time using

@alamb alamb changed the title add fast-path for with_new_children Cache PlanProperties, add fast-path for with_new_children Jan 15, 2026
@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 15, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @askalt

I went through this carefully -- I think the basic idea is great and we should proceed.

The only thing I think we should try and do is minimize the impact of the API changes when people upgrade. I have some thoughts on that below

For next steps, I suggest break this PR down into smaller parts (it is quite challenging to review all the changes now). Some ideas on relevant chunks:

  1. Change ExecutionPlan::properties to return an &Arc<Properties> and the necessary plumbing changes
  2. Changes to FilterExec to avoid cloning project
  3. Changes to AggregateExec to avoid cloning the vec / exprs
  4. Changes to the various join operations to avoid cloning the projections

thoughts on reducing upgrade impact

Since this PR does change some core APIs, think we need to be careful.

  1. I think we can avoid the need to change Schema::project, see askalt#2 for my proposal
  2. We should figure out some way to make FilterExec::with_projection easier to use -- the new signature is pretty awkward.

Maybe we can use ProjectionExprs or something like

struct SharedProjection {
  Arc<[usize]> 
}

impl From<Vec<usize>> for SharedProjection {
  ...
}

And then have FilterExec take

pub fn fn with_projection(mut self, projection impl Into<SharedProjection>)

With enough comments I think this could be clear and would also be backwards compatible

/// This information is available via methods on [`ExecutionPlanProperties`]
/// trait, which is implemented for all `ExecutionPlan`s.
fn properties(&self) -> &PlanProperties;
fn properties(&self) -> &Arc<PlanProperties>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking API change, so it should be documented in the upgrading.md guide:
https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading.md

Per the policy in
https://datafusion.apache.org/contributor-guide/api-health.html

I am happy to help write such an entry

Copy link
Contributor Author

@askalt askalt Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry is added, please review updated upgrading.md.

name: String,
plan: FFI_ExecutionPlan,
properties: PlanProperties,
properties: Arc<PlanProperties>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the FFI is supposed to be a stable boundary, I think we shouldn't change this (and instead just copy the properties when needed)

So I suggest reverting this change

Copy link
Contributor Author

@askalt askalt Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ExecutionPlan is implemented for ForeignExecutionPlan then we must return &Arc<PlanProperties> from properties(...). So Arc should be owned by someone to return its borrowing. We can change a signature of properties(...) to return owned Arc but cloning seems redundant for me, as the most plans store owned Arc.

Note that FFI_ExecutionPlan is not touched by the patch. Do you sure that it is important to keep ForeignExecutionPlan stable?

/// This struct is used to access an execution plan provided by a foreign
/// library across a FFI boundary.
///
/// The ForeignExecutionPlan is to be used by the caller of the plan, so it has
/// no knowledge or access to the private data. All interaction with the plan
/// must occur through the functions defined in FFI_ExecutionPlan.
#[derive(Debug)]
pub struct ForeignExecutionPlan {
    name: String,
    plan: FFI_ExecutionPlan,
    properties: Arc<PlanProperties>,
    children: Vec<Arc<dyn ExecutionPlan>>,
}

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

Merging up so I can run planner benchmarks

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

run benchmark sql_planner

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing askalt/with_new_children_fast_path (ebe2144) to 7c9a76a diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=askalt_with_new_children_fast_path
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                 askalt_with_new_children_fast_path     main
-----                                                 ----------------------------------     ----
logical_aggregate_with_join                           1.01    638.6±3.64µs        ? ?/sec    1.00    631.5±7.46µs        ? ?/sec
logical_select_all_from_1000                          1.01     10.8±0.14ms        ? ?/sec    1.00     10.6±0.14ms        ? ?/sec
logical_select_one_from_700                           1.01    416.6±7.40µs        ? ?/sec    1.00    412.3±2.24µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.02    376.5±2.98µs        ? ?/sec    1.00    369.4±3.42µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.02    363.7±3.60µs        ? ?/sec    1.00    356.0±3.51µs        ? ?/sec
physical_intersection                                 1.00  1577.8±13.04µs        ? ?/sec    1.02  1608.2±21.82µs        ? ?/sec
physical_join_consider_sort                           1.00      2.2±0.03ms        ? ?/sec    1.02      2.3±0.03ms        ? ?/sec
physical_join_distinct                                1.01    353.3±2.97µs        ? ?/sec    1.00    349.0±6.78µs        ? ?/sec
physical_many_self_joins                              1.00     12.6±0.22ms        ? ?/sec    1.01     12.7±0.19ms        ? ?/sec
physical_plan_clickbench_all                          1.00    196.0±1.53ms        ? ?/sec    1.08    211.9±4.78ms        ? ?/sec
physical_plan_clickbench_q1                           1.00      2.1±0.02ms        ? ?/sec    1.04      2.2±0.09ms        ? ?/sec
physical_plan_clickbench_q10                          1.00      3.6±0.08ms        ? ?/sec    1.08      3.8±0.06ms        ? ?/sec
physical_plan_clickbench_q11                          1.00      4.1±0.04ms        ? ?/sec    1.07      4.4±0.13ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      4.1±0.04ms        ? ?/sec    1.11      4.6±0.17ms        ? ?/sec
physical_plan_clickbench_q13                          1.00      3.7±0.03ms        ? ?/sec    1.09      4.1±0.16ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      4.0±0.04ms        ? ?/sec    1.05      4.2±0.10ms        ? ?/sec
physical_plan_clickbench_q15                          1.00      3.8±0.06ms        ? ?/sec    1.06      4.0±0.15ms        ? ?/sec
physical_plan_clickbench_q16                          1.00      3.6±0.04ms        ? ?/sec    1.03      3.7±0.03ms        ? ?/sec
physical_plan_clickbench_q17                          1.00      3.7±0.04ms        ? ?/sec    1.03      3.8±0.04ms        ? ?/sec
physical_plan_clickbench_q18                          1.00      2.6±0.06ms        ? ?/sec    1.02      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q19                          1.00      4.1±0.06ms        ? ?/sec    1.05      4.3±0.11ms        ? ?/sec
physical_plan_clickbench_q2                           1.00      2.8±0.10ms        ? ?/sec    1.04      2.9±0.05ms        ? ?/sec
physical_plan_clickbench_q20                          1.00      2.2±0.04ms        ? ?/sec    1.04      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q21                          1.00      2.8±0.02ms        ? ?/sec    1.02      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q22                          1.00      3.9±0.05ms        ? ?/sec    1.04      4.0±0.06ms        ? ?/sec
physical_plan_clickbench_q23                          1.00      4.1±0.04ms        ? ?/sec    1.04      4.3±0.10ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      4.9±0.03ms        ? ?/sec    1.05      5.1±0.08ms        ? ?/sec
physical_plan_clickbench_q25                          1.00      3.5±0.04ms        ? ?/sec    1.04      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q26                          1.00      2.9±0.04ms        ? ?/sec    1.05      3.0±0.06ms        ? ?/sec
physical_plan_clickbench_q27                          1.00      3.5±0.03ms        ? ?/sec    1.04      3.6±0.10ms        ? ?/sec
physical_plan_clickbench_q28                          1.00      4.4±0.13ms        ? ?/sec    1.06      4.6±0.06ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      4.6±0.05ms        ? ?/sec    1.08      5.0±0.13ms        ? ?/sec
physical_plan_clickbench_q3                           1.00      2.5±0.02ms        ? ?/sec    1.04      2.6±0.04ms        ? ?/sec
physical_plan_clickbench_q30                          1.00     15.3±0.26ms        ? ?/sec    1.07     16.4±0.39ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      4.4±0.06ms        ? ?/sec    1.12      4.9±0.09ms        ? ?/sec
physical_plan_clickbench_q32                          1.00      4.4±0.04ms        ? ?/sec    1.13      5.0±0.13ms        ? ?/sec
physical_plan_clickbench_q33                          1.00      3.5±0.05ms        ? ?/sec    1.07      3.8±0.15ms        ? ?/sec
physical_plan_clickbench_q34                          1.00      3.2±0.06ms        ? ?/sec    1.04      3.3±0.12ms        ? ?/sec
physical_plan_clickbench_q35                          1.00      3.3±0.05ms        ? ?/sec    1.10      3.6±0.09ms        ? ?/sec
physical_plan_clickbench_q36                          1.00      4.1±0.04ms        ? ?/sec    1.06      4.3±0.15ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      4.5±0.06ms        ? ?/sec    1.08      4.9±0.17ms        ? ?/sec
physical_plan_clickbench_q38                          1.00      4.5±0.06ms        ? ?/sec    1.04      4.7±0.04ms        ? ?/sec
physical_plan_clickbench_q39                          1.00      4.0±0.05ms        ? ?/sec    1.04      4.2±0.08ms        ? ?/sec
physical_plan_clickbench_q4                           1.00      2.2±0.02ms        ? ?/sec    1.02      2.2±0.03ms        ? ?/sec
physical_plan_clickbench_q40                          1.00      4.7±0.11ms        ? ?/sec    1.08      5.1±0.14ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      4.1±0.04ms        ? ?/sec    1.08      4.4±0.09ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      4.1±0.07ms        ? ?/sec    1.10      4.5±0.13ms        ? ?/sec
physical_plan_clickbench_q43                          1.00      4.4±0.08ms        ? ?/sec    1.12      4.9±0.22ms        ? ?/sec
physical_plan_clickbench_q44                          1.00      2.3±0.02ms        ? ?/sec    1.04      2.4±0.03ms        ? ?/sec
physical_plan_clickbench_q45                          1.00      2.3±0.02ms        ? ?/sec    1.04      2.4±0.08ms        ? ?/sec
physical_plan_clickbench_q46                          1.00      3.2±0.03ms        ? ?/sec    1.05      3.3±0.08ms        ? ?/sec
physical_plan_clickbench_q47                          1.00      4.0±0.07ms        ? ?/sec    1.09      4.4±0.13ms        ? ?/sec
physical_plan_clickbench_q48                          1.00      5.0±0.05ms        ? ?/sec    1.17      5.9±0.25ms        ? ?/sec
physical_plan_clickbench_q49                          1.00      5.3±0.16ms        ? ?/sec    1.14      6.1±0.24ms        ? ?/sec
physical_plan_clickbench_q5                           1.00      2.5±0.02ms        ? ?/sec    1.07      2.6±0.09ms        ? ?/sec
physical_plan_clickbench_q50                          1.00      4.1±0.06ms        ? ?/sec    1.05      4.3±0.07ms        ? ?/sec
physical_plan_clickbench_q51                          1.00      3.5±0.03ms        ? ?/sec    1.07      3.7±0.12ms        ? ?/sec
physical_plan_clickbench_q6                           1.00      2.5±0.03ms        ? ?/sec    1.07      2.7±0.13ms        ? ?/sec
physical_plan_clickbench_q7                           1.00      2.1±0.03ms        ? ?/sec    1.04      2.1±0.03ms        ? ?/sec
physical_plan_clickbench_q8                           1.00      3.4±0.08ms        ? ?/sec    1.09      3.7±0.07ms        ? ?/sec
physical_plan_clickbench_q9                           1.00      3.5±0.05ms        ? ?/sec    1.09      3.8±0.06ms        ? ?/sec
physical_plan_tpcds_all                               1.00  1825.9±12.04ms        ? ?/sec    1.09  1986.6±16.33ms        ? ?/sec
physical_plan_tpch_all                                1.00    122.8±1.51ms        ? ?/sec    1.06    130.6±1.74ms        ? ?/sec
physical_plan_tpch_q1                                 1.00      2.9±0.05ms        ? ?/sec    1.06      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q10                                1.00      6.9±0.08ms        ? ?/sec    1.09      7.5±0.08ms        ? ?/sec
physical_plan_tpch_q11                                1.00      8.3±0.14ms        ? ?/sec    1.07      8.8±0.11ms        ? ?/sec
physical_plan_tpch_q12                                1.00      3.0±0.03ms        ? ?/sec    1.07      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q13                                1.00      2.9±0.03ms        ? ?/sec    1.05      3.1±0.10ms        ? ?/sec
physical_plan_tpch_q14                                1.00      2.9±0.04ms        ? ?/sec    1.11      3.3±0.15ms        ? ?/sec
physical_plan_tpch_q16                                1.00      5.0±0.03ms        ? ?/sec    1.07      5.4±0.13ms        ? ?/sec
physical_plan_tpch_q17                                1.00      5.4±0.13ms        ? ?/sec    1.06      5.8±0.16ms        ? ?/sec
physical_plan_tpch_q18                                1.00      5.8±0.20ms        ? ?/sec    1.06      6.2±0.18ms        ? ?/sec
physical_plan_tpch_q19                                1.00      4.9±0.10ms        ? ?/sec    1.05      5.2±0.14ms        ? ?/sec
physical_plan_tpch_q2                                 1.00     11.8±0.09ms        ? ?/sec    1.09     12.8±0.14ms        ? ?/sec
physical_plan_tpch_q20                                1.00      7.9±0.15ms        ? ?/sec    1.08      8.5±0.13ms        ? ?/sec
physical_plan_tpch_q21                                1.00      9.8±0.10ms        ? ?/sec    1.07     10.5±0.13ms        ? ?/sec
physical_plan_tpch_q22                                1.00      6.3±0.13ms        ? ?/sec    1.04      6.6±0.16ms        ? ?/sec
physical_plan_tpch_q3                                 1.00      5.4±0.03ms        ? ?/sec    1.07      5.8±0.16ms        ? ?/sec
physical_plan_tpch_q4                                 1.00      2.9±0.02ms        ? ?/sec    1.05      3.0±0.06ms        ? ?/sec
physical_plan_tpch_q5                                 1.00      5.7±0.09ms        ? ?/sec    1.08      6.2±0.12ms        ? ?/sec
physical_plan_tpch_q6                                 1.00  1561.4±26.15µs        ? ?/sec    1.03   1613.7±9.79µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      6.9±0.17ms        ? ?/sec    1.09      7.5±0.17ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      8.9±0.07ms        ? ?/sec    1.12     10.0±0.16ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      6.3±0.07ms        ? ?/sec    1.11      7.0±0.14ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     16.8±0.12ms        ? ?/sec    1.05     17.6±0.13ms        ? ?/sec
physical_select_all_from_1000                         1.00     23.7±0.57ms        ? ?/sec    1.00     23.7±0.17ms        ? ?/sec
physical_select_one_from_700                          1.02   1339.3±8.43µs        ? ?/sec    1.00   1316.3±9.03µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00      9.6±0.12ms        ? ?/sec    1.18     11.3±0.21ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00     26.6±0.45ms        ? ?/sec    1.15     30.7±0.63ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    151.8±1.49ms        ? ?/sec    1.33    201.5±2.48ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    923.8±6.93ms        ? ?/sec    1.22  1128.4±13.08ms        ? ?/sec
physical_theta_join_consider_sort                     1.00      2.7±0.08ms        ? ?/sec    1.01      2.7±0.07ms        ? ?/sec
physical_unnest_to_join                               1.00      3.0±0.06ms        ? ?/sec    1.01      3.1±0.02ms        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00   1252.5±9.46µs        ? ?/sec    1.28  1605.1±19.60µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00      2.1±0.02ms        ? ?/sec    1.43      3.0±0.01ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00   934.7±14.38µs        ? ?/sec    1.17   1095.3±9.36µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00  1050.8±11.93µs        ? ?/sec    1.22  1279.7±17.42µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00  1090.7±18.94µs        ? ?/sec    1.23  1345.7±14.19µs        ? ?/sec
with_param_values_many_columns                        1.00   574.7±12.30µs        ? ?/sec    1.04    594.9±8.15µs        ? ?/sec

@xudong963 xudong963 self-requested a review January 16, 2026 07:17
@askalt
Copy link
Contributor Author

askalt commented Jan 16, 2026

I am working on splitting changes into separate PRs, will move changes related to shared projection and aggregates. Changes related to with_new_children and plan properties Arcing will be remained and could be reviewed here.

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from ebe2144 to a8028af Compare January 16, 2026 11:45
@github-actions github-actions bot removed common Related to common crate proto Related to proto crate labels Jan 16, 2026
@alamb
Copy link
Contributor

alamb commented Jan 16, 2026

BTW the planner benchmark results in #19792 (comment) show a 1-10% improvement across the board and some queries are going 40% faster (see physical_window_function_partition_by_30_on_values for example)

I am going to rerun to see if the results are reproducible

@alamb
Copy link
Contributor

alamb commented Jan 16, 2026

run benchmark sql_planner

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing askalt/with_new_children_fast_path (a8028af) to ca904b3 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=askalt_with_new_children_fast_path
Results will be posted here when complete

@askalt
Copy link
Contributor Author

askalt commented Jan 16, 2026

I removed projection and aggr expr/filters Arcs. So it could affect benchmark results. Will create a separate issue and PR for it soon.

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                 askalt_with_new_children_fast_path     main
-----                                                 ----------------------------------     ----
logical_aggregate_with_join                           1.02    640.2±6.91µs        ? ?/sec    1.00    630.2±4.39µs        ? ?/sec
logical_select_all_from_1000                          1.01     10.8±0.11ms        ? ?/sec    1.00     10.7±0.27ms        ? ?/sec
logical_select_one_from_700                           1.01    418.2±9.16µs        ? ?/sec    1.00    415.6±5.42µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.02    378.9±6.22µs        ? ?/sec    1.00    371.6±4.20µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.01    365.4±6.66µs        ? ?/sec    1.00   360.1±17.55µs        ? ?/sec
physical_intersection                                 1.00  1577.2±23.35µs        ? ?/sec    1.01  1597.5±20.28µs        ? ?/sec
physical_join_consider_sort                           1.00      2.2±0.03ms        ? ?/sec    1.02      2.3±0.03ms        ? ?/sec
physical_join_distinct                                1.02    356.6±6.65µs        ? ?/sec    1.00    350.9±2.34µs        ? ?/sec
physical_many_self_joins                              1.01     12.6±0.18ms        ? ?/sec    1.00     12.5±0.13ms        ? ?/sec
physical_plan_clickbench_all                          1.00    200.5±3.43ms        ? ?/sec    1.00    201.3±2.49ms        ? ?/sec
physical_plan_clickbench_q1                           1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.06ms        ? ?/sec
physical_plan_clickbench_q10                          1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q11                          1.00      4.1±0.14ms        ? ?/sec    1.01      4.2±0.03ms        ? ?/sec
physical_plan_clickbench_q12                          1.02      4.4±0.15ms        ? ?/sec    1.00      4.3±0.08ms        ? ?/sec
physical_plan_clickbench_q13                          1.00      3.8±0.10ms        ? ?/sec    1.01      3.8±0.10ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      4.2±0.12ms        ? ?/sec    1.00      4.2±0.07ms        ? ?/sec
physical_plan_clickbench_q15                          1.00      3.8±0.09ms        ? ?/sec    1.02      3.9±0.10ms        ? ?/sec
physical_plan_clickbench_q16                          1.00      3.6±0.03ms        ? ?/sec    1.01      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q17                          1.00      3.7±0.05ms        ? ?/sec    1.01      3.8±0.03ms        ? ?/sec
physical_plan_clickbench_q18                          1.00      2.6±0.02ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q19                          1.00      4.1±0.06ms        ? ?/sec    1.02      4.2±0.08ms        ? ?/sec
physical_plan_clickbench_q2                           1.00      2.8±0.01ms        ? ?/sec    1.02      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q20                          1.00      2.2±0.02ms        ? ?/sec    1.03      2.2±0.03ms        ? ?/sec
physical_plan_clickbench_q21                          1.00      2.8±0.06ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q22                          1.00      3.9±0.09ms        ? ?/sec    1.01      4.0±0.05ms        ? ?/sec
physical_plan_clickbench_q23                          1.00      4.2±0.11ms        ? ?/sec    1.02      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      4.9±0.05ms        ? ?/sec    1.02      5.0±0.05ms        ? ?/sec
physical_plan_clickbench_q25                          1.00      3.5±0.07ms        ? ?/sec    1.01      3.5±0.03ms        ? ?/sec
physical_plan_clickbench_q26                          1.00      2.9±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q27                          1.00      3.5±0.05ms        ? ?/sec    1.01      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q28                          1.00      4.5±0.08ms        ? ?/sec    1.03      4.6±0.04ms        ? ?/sec
physical_plan_clickbench_q29                          1.01      5.0±0.13ms        ? ?/sec    1.00      4.9±0.07ms        ? ?/sec
physical_plan_clickbench_q3                           1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.05ms        ? ?/sec
physical_plan_clickbench_q30                          1.01     15.8±0.41ms        ? ?/sec    1.00     15.6±0.24ms        ? ?/sec
physical_plan_clickbench_q31                          1.04      4.7±0.15ms        ? ?/sec    1.00      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q32                          1.05      4.7±0.15ms        ? ?/sec    1.00      4.5±0.09ms        ? ?/sec
physical_plan_clickbench_q33                          1.00      3.6±0.08ms        ? ?/sec    1.01      3.6±0.07ms        ? ?/sec
physical_plan_clickbench_q34                          1.02      3.3±0.09ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q35                          1.00      3.4±0.10ms        ? ?/sec    1.00      3.4±0.07ms        ? ?/sec
physical_plan_clickbench_q36                          1.02      4.3±0.12ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      4.6±0.09ms        ? ?/sec    1.03      4.7±0.04ms        ? ?/sec
physical_plan_clickbench_q38                          1.00      4.6±0.08ms        ? ?/sec    1.01      4.7±0.04ms        ? ?/sec
physical_plan_clickbench_q39                          1.00      4.1±0.12ms        ? ?/sec    1.01      4.1±0.08ms        ? ?/sec
physical_plan_clickbench_q4                           1.01      2.2±0.06ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q40                          1.00      4.8±0.16ms        ? ?/sec    1.03      4.9±0.05ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      4.2±0.05ms        ? ?/sec    1.03      4.3±0.03ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      4.2±0.16ms        ? ?/sec    1.01      4.2±0.03ms        ? ?/sec
physical_plan_clickbench_q43                          1.00      4.5±0.15ms        ? ?/sec    1.01      4.6±0.17ms        ? ?/sec
physical_plan_clickbench_q44                          1.02      2.3±0.04ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q45                          1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q46                          1.00      3.2±0.09ms        ? ?/sec    1.00      3.2±0.08ms        ? ?/sec
physical_plan_clickbench_q47                          1.00      4.1±0.09ms        ? ?/sec    1.03      4.2±0.03ms        ? ?/sec
physical_plan_clickbench_q48                          1.00      5.1±0.11ms        ? ?/sec    1.04      5.3±0.09ms        ? ?/sec
physical_plan_clickbench_q49                          1.00      5.3±0.06ms        ? ?/sec    1.06      5.7±0.12ms        ? ?/sec
physical_plan_clickbench_q5                           1.00      2.5±0.02ms        ? ?/sec    1.02      2.5±0.05ms        ? ?/sec
physical_plan_clickbench_q50                          1.00      4.2±0.13ms        ? ?/sec    1.00      4.2±0.13ms        ? ?/sec
physical_plan_clickbench_q51                          1.01      3.6±0.06ms        ? ?/sec    1.00      3.5±0.09ms        ? ?/sec
physical_plan_clickbench_q6                           1.00      2.5±0.02ms        ? ?/sec    1.01      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q7                           1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q8                           1.00      3.4±0.10ms        ? ?/sec    1.01      3.5±0.08ms        ? ?/sec
physical_plan_clickbench_q9                           1.00      3.5±0.02ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec
physical_plan_tpcds_all                               1.00  1816.0±10.79ms        ? ?/sec    1.07  1937.2±15.09ms        ? ?/sec
physical_plan_tpch_all                                1.00    121.6±1.29ms        ? ?/sec    1.05    127.6±1.38ms        ? ?/sec
physical_plan_tpch_q1                                 1.00      2.9±0.02ms        ? ?/sec    1.06      3.1±0.07ms        ? ?/sec
physical_plan_tpch_q10                                1.00      6.9±0.09ms        ? ?/sec    1.08      7.4±0.13ms        ? ?/sec
physical_plan_tpch_q11                                1.00      8.2±0.11ms        ? ?/sec    1.05      8.7±0.12ms        ? ?/sec
physical_plan_tpch_q12                                1.00      2.9±0.02ms        ? ?/sec    1.04      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q13                                1.00      2.9±0.02ms        ? ?/sec    1.05      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q14                                1.00      2.9±0.02ms        ? ?/sec    1.08      3.1±0.04ms        ? ?/sec
physical_plan_tpch_q16                                1.00      5.0±0.02ms        ? ?/sec    1.06      5.3±0.05ms        ? ?/sec
physical_plan_tpch_q17                                1.00      5.4±0.04ms        ? ?/sec    1.06      5.7±0.12ms        ? ?/sec
physical_plan_tpch_q18                                1.00      5.7±0.09ms        ? ?/sec    1.04      6.0±0.05ms        ? ?/sec
physical_plan_tpch_q19                                1.00      4.9±0.03ms        ? ?/sec    1.03      5.1±0.10ms        ? ?/sec
physical_plan_tpch_q2                                 1.00     11.8±0.39ms        ? ?/sec    1.08     12.7±0.29ms        ? ?/sec
physical_plan_tpch_q20                                1.00      7.8±0.09ms        ? ?/sec    1.04      8.1±0.18ms        ? ?/sec
physical_plan_tpch_q21                                1.00      9.8±0.23ms        ? ?/sec    1.04     10.1±0.13ms        ? ?/sec
physical_plan_tpch_q22                                1.00      6.3±0.05ms        ? ?/sec    1.03      6.5±0.07ms        ? ?/sec
physical_plan_tpch_q3                                 1.00      5.4±0.04ms        ? ?/sec    1.07      5.7±0.18ms        ? ?/sec
physical_plan_tpch_q4                                 1.00      2.9±0.03ms        ? ?/sec    1.03      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q5                                 1.00      5.7±0.04ms        ? ?/sec    1.05      6.0±0.04ms        ? ?/sec
physical_plan_tpch_q6                                 1.00  1565.2±15.83µs        ? ?/sec    1.03  1610.1±48.09µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      6.9±0.04ms        ? ?/sec    1.07      7.3±0.10ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      8.9±0.08ms        ? ?/sec    1.08      9.6±0.20ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      6.3±0.04ms        ? ?/sec    1.08      6.8±0.15ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     16.8±0.19ms        ? ?/sec    1.05     17.7±0.45ms        ? ?/sec
physical_select_all_from_1000                         1.00     23.5±0.11ms        ? ?/sec    1.02     23.9±0.79ms        ? ?/sec
physical_select_one_from_700                          1.00  1345.1±58.03µs        ? ?/sec    1.00  1342.3±37.51µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00      9.6±0.12ms        ? ?/sec    1.17     11.2±0.15ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00     26.6±0.27ms        ? ?/sec    1.14     30.4±0.34ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    151.3±1.86ms        ? ?/sec    1.32    200.0±3.55ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    917.1±8.35ms        ? ?/sec    1.22  1115.5±10.07ms        ? ?/sec
physical_theta_join_consider_sort                     1.00      2.6±0.01ms        ? ?/sec    1.02      2.7±0.03ms        ? ?/sec
physical_unnest_to_join                               1.00      3.0±0.03ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00   1252.0±6.71µs        ? ?/sec    1.27  1587.0±20.05µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00      2.1±0.01ms        ? ?/sec    1.44      3.0±0.06ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00   925.2±11.34µs        ? ?/sec    1.19  1101.4±10.51µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00  1045.0±26.09µs        ? ?/sec    1.21  1269.2±32.44µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00   1099.9±4.88µs        ? ?/sec    1.22  1346.1±44.97µs        ? ?/sec
with_param_values_many_columns                        1.00    579.6±6.92µs        ? ?/sec    1.01   588.0±14.21µs        ? ?/sec

@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from a8028af to fb13e41 Compare January 16, 2026 13:12
This patch aims to implement a fast-path for the ExecutionPlan::with_new_children function
for some plans, moving closer to a physical plan re-use implementation and improving planning
performance. If the passed children properties are the same as in self, we do not actually
recompute self's properties  (which could be costly if projection mapping is required).
Instead, we just replace the children and re-use self's properties as-is.

To be able to compare two different properties -- ExecutionPlan::properties(...) signature
is modified and now returns `&Arc<PlanProperties>`. If `children` properties are the same
in `with_new_children` -- we clone our properties arc and then a parent plan will consider
our properties as unchanged, doing the same.

- Return `&Arc<PlanProperties>` from `ExecutionPlan::properties(...)` instead of a reference.
- Implement `with_new_children` fast-path if there is no children properties changes for all
  major plans.

Note: currently, `reset_plan_states` does not allow to re-use plan in general: it is not
supported for dynamic filters and recursive queries features, as in this case state reset
should update pointers in the children plans.

Closes apache#19796
@askalt askalt force-pushed the askalt/with_new_children_fast_path branch from fb13e41 to 509e3ff Compare January 16, 2026 13:44
@askalt askalt requested a review from alamb January 17, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation ffi Changes to the ffi crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid recomputing PlanProperties redundently

3 participants