@@ -57,41 +57,60 @@ mod row_hash;
5757mod topk;
5858mod topk_stream;
5959
60- /// Hash aggregate modes
60+ /// Aggregation modes
6161///
6262/// See [`Accumulator::state`] for background information on multi-phase
6363/// aggregation and how these modes are used.
6464#[ derive( Debug , Copy , Clone , PartialEq , Eq ) ]
6565pub enum AggregateMode {
66+ /// One of multiple layers of aggregation, any input partitioning
67+ ///
6668 /// Partial aggregate that can be applied in parallel across input
6769 /// partitions.
6870 ///
6971 /// This is the first phase of a multi-phase aggregation.
7072 Partial ,
73+ /// *Final* of multiple layers of aggregation, in exactly one partition
74+ ///
7175 /// Final aggregate that produces a single partition of output by combining
7276 /// the output of multiple partial aggregates.
7377 ///
7478 /// This is the second phase of a multi-phase aggregation.
79+ ///
80+ /// This mode requires that the input is a single partition
81+ ///
82+ /// Note: Adjacent `Partial` and `Final` mode aggregation is equivalent to a `Single`
83+ /// mode aggregation node. The `Final` mode is required since this is used in an
84+ /// intermediate step. The [`CombinePartialFinalAggregate`] physical optimizer rule
85+ /// will replace this combination with `Single` mode for more efficient execution.
86+ ///
87+ /// [`CombinePartialFinalAggregate`]: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/combine_partial_final_agg/struct.CombinePartialFinalAggregate.html
7588 Final ,
89+ /// *Final* of multiple layers of aggregation, input is *Partitioned*
90+ ///
7691 /// Final aggregate that works on pre-partitioned data.
7792 ///
78- /// This requires the invariant that all rows with a particular
79- /// grouping key are in the same partitions, such as is the case
80- /// with Hash repartitioning on the group keys. If a group key is
81- /// duplicated, duplicate groups would be produced
93+ /// This mode requires that all rows with a particular grouping key are in
94+ /// the same partitions, such as is the case with Hash repartitioning on the
95+ /// group keys. If a group key is duplicated, duplicate groups would be
96+ /// produced
8297 FinalPartitioned ,
98+ /// *Single* layer of Aggregation, input is exactly one partition
99+ ///
83100 /// Applies the entire logical aggregation operation in a single operator,
84101 /// as opposed to Partial / Final modes which apply the logical aggregation using
85102 /// two operators.
86103 ///
87104 /// This mode requires that the input is a single partition (like Final)
88105 Single ,
106+ /// *Single* layer of Aggregation, input is *Partitioned*
107+ ///
89108 /// Applies the entire logical aggregation operation in a single operator,
90- /// as opposed to Partial / Final modes which apply the logical aggregation using
91- /// two operators.
109+ /// as opposed to Partial / Final modes which apply the logical aggregation
110+ /// using two operators.
92111 ///
93- /// This mode requires that the input is partitioned by group key (like
94- /// FinalPartitioned)
112+ /// This mode requires that the input has more than one partition, and is
113+ /// partitioned by group key (like FinalPartitioned).
95114 SinglePartitioned ,
96115}
97116
0 commit comments