Aggregate Functions: Handling Nulls, Ordering, and Distinct Values

expression that does not itself contain an aggregate expression or a window function call. The optional <replaceable>order_by_clause</replaceable> and <replaceable>filter_clause</replaceable> are described below. </para> <para> The first form of aggregate expression invokes the aggregate once for each input row. The second form is the same as the first, since <literal>ALL</literal> is the default. The third form invokes the aggregate once for each distinct value of the expression (or distinct set of values, for multiple expressions) found in the input rows. The fourth form invokes the aggregate once for each input row; since no particular input value is specified, it is generally only useful for the <function>count(*)</function> aggregate function. The last form is used with <firstterm>ordered-set</firstterm> aggregate functions, which are described below. </para> <para> Most aggregate functions ignore null inputs, so that rows in which one or more of the expression(s) yield null are discarded. This can be assumed to be true, unless otherwise specified, for all built-in aggregates. </para> <para> For example, <literal>count(*)</literal> yields the total number of input rows; <literal>count(f1)</literal> yields the number of input rows in which <literal>f1</literal> is non-null, since <function>count</function> ignores nulls; and <literal>count(distinct f1)</literal> yields the number of distinct non-null values of <literal>f1</literal>. </para> <para> Ordinarily, the input rows are fed to the aggregate function in an unspecified order. In many cases this does not matter; for example, <function>min</function> produces the same result no matter what order it receives the inputs in. However, some aggregate functions (such as <function>array_agg</function> and <function>string_agg</function>) produce results that depend on the ordering of the input rows. When using such an aggregate, the optional <replaceable>order_by_clause</replaceable> can be used to specify the desired ordering. The <replaceable>order_by_clause</replaceable> has the same syntax as for a query-level <literal>ORDER BY</literal> clause, as described in <xref linkend="queries-order"/>, except that its expressions are always just expressions and cannot be output-column names or numbers. For example: <programlisting> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) ) SELECT array_agg(v ORDER BY v DESC) FROM vals; array_agg ------------- {4,3,3,2,1} </programlisting> Since <type>jsonb</type> only keeps the last matching key, ordering of its keys can be significant: <programlisting> WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') ) SELECT jsonb_object_agg(k, v ORDER BY v) FROM vals; jsonb_object_agg ---------------------------- {"key0": "1", "key1": "3"} </programlisting> </para> <para> When dealing with multiple-argument aggregate functions, note that the <literal>ORDER BY</literal> clause goes after all the aggregate arguments. For example, write this: <programlisting> SELECT string_agg(a, ',' ORDER BY a) FROM table; </programlisting> not this: <programlisting> SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect </programlisting> The latter is syntactically valid, but it represents a call of a single-argument aggregate function with two <literal>ORDER BY</literal> keys (the second one being rather useless since it's a constant). </para> <para> If <literal>DISTINCT</literal> is specified with an <replaceable>order_by_clause</replaceable>, <literal>ORDER BY</literal> expressions can only reference columns in the <literal>DISTINCT</literal> list. For example: <programlisting> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) ) SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals; array_agg

This section elaborates on how aggregate functions handle null inputs (generally ignoring them), and how to control the order in which input rows are processed using the ORDER BY clause, especially for order-dependent aggregates like array_agg and string_agg. It emphasizes the correct placement of the ORDER BY clause in multi-argument aggregates. Additionally, it explains the behavior of DISTINCT in aggregate functions and the constraints on ORDER BY expressions when used with DISTINCT.