Skip to content

Conversation

@tshauck
Copy link
Contributor

@tshauck tshauck commented Nov 19, 2025

Which issue does this PR close?

./target/debug/datafusion-cli 
DataFusion CLI v51.0.0
> create table t(a int, b int) as values (1,3), (3,4), (4,5);
0 row(s) fetched. 
Elapsed 0.021 seconds.

> select a from t group by a order by min(b);
+---+
| a |
+---+
| 1 |
| 3 |
| 4 |
+---+
3 row(s) fetched. 
Elapsed 0.017 seconds.

> EXPLAIN select a from t group by a order by min(b);
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │            a: a           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │  SortPreservingMergeExec  │ |
|               | │    --------------------   │ |
|               | │  min(t.b) ASC NULLS LAST  │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │          SortExec         │ |
|               | │    --------------------   │ |
|               | │ min(t.b)@1 ASC NULLS LAST │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       AggregateExec       │ |
|               | │    --------------------   │ |
|               | │       aggr: min(t.b)      │ |
|               | │        group_by: a        │ |
|               | │                           │ |
|               | │           mode:           │ |
|               | │      FinalPartitioned     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │    CoalesceBatchesExec    │ |
|               | │    --------------------   │ |
|               | │     target_batch_size:    │ |
|               | │            8192           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      │ |
|               | │    --------------------   │ |
|               | │ partition_count(in->out): │ |
|               | │           1 -> 8          │ |
|               | │                           │ |
|               | │    partitioning_scheme:   │ |
|               | │       Hash([a@0], 8)      │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       AggregateExec       │ |
|               | │    --------------------   │ |
|               | │       aggr: min(t.b)      │ |
|               | │        group_by: a        │ |
|               | │       mode: Partial       │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 224        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched. 
Elapsed 0.019 seconds.

Rationale for this change

This PR fixes a bug where if there's an aggregate expression in the order by that is not included in the projection, the query fails as the expression isn't propagated.

What changes are included in this PR?

Fixes the bug by making sure expressions in the order by are also passed to the aggregate function.

Are these changes tested?

Yes, tests are added to ensure the query compiles and produces the proper plan.

Are there any user-facing changes?

No

@github-actions github-actions bot added the sql SQL Planner label Nov 19, 2025
@tshauck tshauck force-pushed the fix-18683 branch 3 times, most recently from 9bf394f to 0d2c9e9 Compare November 19, 2025 23:52
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 19, 2025
@tshauck tshauck marked this pull request as ready for review November 20, 2025 18:40
@tshauck tshauck requested a review from simonvandel November 20, 2025 18:43
Ok(sort_expr.with_expr(final_expr))
})
.collect::<Result<Vec<SortExpr>>>()?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check_columns_satisfy_exprs here for better UX?

            check_columns_satisfy_exprs(
                &order_by_post_aggr,
                std::slice::from_ref(&order_by_post_aggr),
                CheckColumnsSatisfyExprsPurpose::Aggregate(
                    CheckColumnsMustReferenceAggregatePurpose::OrderBy,
                ),
            )?;

Might need a testcase too.

> SELECT column2 FROM foo GROUP BY column2 ORDER BY column1
Error during planning: Column in ORDER BY must be in GROUP BY or an aggregate function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Schema error: No field named t.b. Valid fields are t.a.

3 participants