ORDER BY Clause in Aggregate Functions: DISTINCT and WITHIN GROUP

</para> <para> When dealing with multiple-argument aggregate functions, note that the <literal>ORDER BY</literal> clause goes after all the aggregate arguments. For example, write this: <programlisting> SELECT string_agg(a, ',' ORDER BY a) FROM table; </programlisting> not this: <programlisting> SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect </programlisting> The latter is syntactically valid, but it represents a call of a single-argument aggregate function with two <literal>ORDER BY</literal> keys (the second one being rather useless since it's a constant). </para> <para> If <literal>DISTINCT</literal> is specified with an <replaceable>order_by_clause</replaceable>, <literal>ORDER BY</literal> expressions can only reference columns in the <literal>DISTINCT</literal> list. For example: <programlisting> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) ) SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals; array_agg ----------- {4,3,2,1} </programlisting> </para> <para> Placing <literal>ORDER BY</literal> within the aggregate's regular argument list, as described so far, is used when ordering the input rows for general-purpose and statistical aggregates, for which ordering is optional. There is a subclass of aggregate functions called <firstterm>ordered-set aggregates</firstterm> for which an <replaceable>order_by_clause</replaceable> is <emphasis>required</emphasis>, usually because the aggregate's computation is only sensible in terms of a specific ordering of its input rows. Typical examples of ordered-set aggregates include rank and percentile calculations. For an ordered-set aggregate, the <replaceable>order_by_clause</replaceable> is written inside <literal>WITHIN GROUP (...)</literal>, as shown in the final syntax alternative above. The expressions in the <replaceable>order_by_clause</replaceable> are evaluated once per input row just like regular aggregate arguments, sorted as per the <replaceable>order_by_clause</replaceable>'s requirements, and fed to the aggregate function as input arguments. (This is unlike the case for a non-<literal>WITHIN GROUP</literal> <replaceable>order_by_clause</replaceable>, which is not treated as argument(s) to the aggregate function.) The argument expressions preceding <literal>WITHIN GROUP</literal>, if any, are called <firstterm>direct arguments</firstterm> to distinguish them from the <firstterm>aggregated arguments</firstterm> listed in the <replaceable>order_by_clause</replaceable>. Unlike regular aggregate arguments, direct arguments are evaluated only once per aggregate call, not once per input row. This means that they can contain variables only if those variables are grouped by <literal>GROUP BY</literal>; this restriction is the same as if the direct arguments were not inside an aggregate expression at all. Direct arguments are typically used for things like percentile fractions, which only make sense as a single value per aggregation calculation. The direct argument list can be empty; in this case, write just <literal>()</literal> not <literal>(*)</literal>. (<productname>PostgreSQL</productname> will actually accept either spelling, but only the first way conforms to the SQL standard.) </para> <para> <indexterm> <primary>median</primary> <seealso>percentile</seealso> </indexterm> An example of an ordered-set aggregate call is: <programlisting> SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households; percentile_cont ----------------- 50489 </programlisting> which obtains the 50th percentile, or median, value of the <structfield>income</structfield> column from table <structname>households</structname>. Here, <literal>0.5</literal> is a direct argument; it would make no sense for the percentile

This section explains the usage of the ORDER BY clause with DISTINCT in aggregate functions, where ORDER BY expressions can only reference columns in the DISTINCT list. It then introduces ordered-set aggregates, which require an ORDER BY clause within WITHIN GROUP. The expressions in this clause are evaluated per input row and sorted. Arguments preceding WITHIN GROUP are called direct arguments, evaluated once per aggregate call, while those within are aggregated arguments. An example of an ordered-set aggregate call using percentile_cont is provided.