Partial Aggregation in Aggregate Functions

aggregated input(s). As with normal aggregates, <literal>finalfunc_extra</literal> is only really useful if the aggregate is polymorphic; then the extra dummy argument(s) are needed to connect the final function's result type to the aggregate's input type(s). </para> <para> Currently, ordered-set aggregates cannot be used as window functions, and therefore there is no need for them to support moving-aggregate mode. </para> </sect2> <sect2 id="xaggr-partial-aggregates"> <title>Partial Aggregation</title> <indexterm> <primary>aggregate function</primary> <secondary>partial aggregation</secondary> </indexterm> <para> Optionally, an aggregate function can support <firstterm>partial aggregation</firstterm>. The idea of partial aggregation is to run the aggregate's state transition function over different subsets of the input data independently, and then to combine the state values resulting from those subsets to produce the same state value that would have resulted from scanning all the input in a single operation. This mode can be used for parallel aggregation by having different worker processes scan different portions of a table. Each worker produces a partial state value, and at the end those state values are combined to produce a final state value. (In the future this mode might also be used for purposes such as combining aggregations over local and remote tables; but that is not implemented yet.) </para> <para> To support partial aggregation, the aggregate definition must provide a <firstterm>combine function</firstterm>, which takes two values of the aggregate's state type (representing the results of aggregating over two subsets of the input rows) and produces a new value of the state type, representing what the state would have been after aggregating over the combination of those sets of rows. It is unspecified what the relative order of the input rows from the two sets would have been. This means that it's usually impossible to define a useful combine function for aggregates that are sensitive to input row order. </para> <para> As simple examples, <literal>MAX</literal> and <literal>MIN</literal> aggregates can be made to support partial aggregation by specifying the combine function as the same greater-of-two or lesser-of-two comparison function that is used as their transition function. <literal>SUM</literal> aggregates just need an addition function as combine function. (Again, this is the same as their transition function, unless the state value is wider than the input data type.) </para> <para> The combine function is treated much like a transition function that happens to take a value of the state type, not of the underlying input type, as its second argument. In particular, the rules for dealing with null values and strict functions are similar. Also, if the aggregate definition specifies a non-null <literal>initcond</literal>, keep in mind that that will be used not only as the initial state for each partial aggregation run, but also as the initial state for the combine function, which will be called to combine each partial result into that state. </para> <para> If the aggregate's state type is declared as <type>internal</type>, it is the combine function's responsibility that its result is allocated in the correct memory context for aggregate state values. This means in particular that when the first input is <literal>NULL</literal> it's invalid to simply return the second input, as that value will be in the wrong context and will not have sufficient lifespan. </para> <para> When the aggregate's state type is declared as <type>internal</type>,

Partial aggregation allows an aggregate function to process different subsets of input data independently and then combine the results to produce the same state value as if the entire dataset was processed at once, requiring a combine function that takes two state values and returns a new state value, and is useful for parallel aggregation and potentially combining aggregations over local and remote tables