Home Explore Blog CI



postgresql

7th chunk of `doc/src/sgml/xaggr.sgml`
7511c62e9a5c2ec93173d8c8031d94e2e8865c1560e6103e0000000100000fa1

CREATE FUNCTION ordered_set_transition(internal, anyelement)
  RETURNS internal ...;
CREATE FUNCTION percentile_disc_final(internal, float8, anyelement)
  RETURNS anyelement ...;

CREATE AGGREGATE percentile_disc (float8 ORDER BY anyelement)
(
    sfunc = ordered_set_transition,
    stype = internal,
    finalfunc = percentile_disc_final,
    finalfunc_extra
);
</programlisting>

   This aggregate takes a <type>float8</type> direct argument (the percentile
   fraction) and an aggregated input that can be of any sortable data type.
   It could be used to obtain a median household income like this:

<programlisting>
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
 percentile_disc
-----------------
           50489
</programlisting>

   Here, <literal>0.5</literal> is a direct argument; it would make no sense
   for the percentile fraction to be a value varying across rows.
  </para>

  <para>
   Unlike the case for normal aggregates, the sorting of input rows for
   an ordered-set aggregate is <emphasis>not</emphasis> done behind the scenes,
   but is the responsibility of the aggregate's support functions.
   The typical implementation approach is to keep a reference to
   a <quote>tuplesort</quote> object in the aggregate's state value, feed the
   incoming rows into that object, and then complete the sorting and
   read out the data in the final function.  This design allows the
   final function to perform special operations such as injecting
   additional <quote>hypothetical</quote> rows into the data to be sorted.
   While normal aggregates can often be implemented with support
   functions written in <application>PL/pgSQL</application> or another
   PL language, ordered-set aggregates generally have to be written in
   C, since their state values aren't definable as any SQL data type.
   (In the above example, notice that the state value is declared as
   type <type>internal</type> &mdash; this is typical.)
   Also, because the final function performs the sort, it is not possible
   to continue adding input rows by executing the transition function again
   later.  This means the final function is not <literal>READ_ONLY</literal>;
   it must be declared in <link linkend="sql-createaggregate"><command>CREATE AGGREGATE</command></link>
   as <literal>READ_WRITE</literal>, or as <literal>SHAREABLE</literal> if
   it's possible for additional final-function calls to make use of the
   already-sorted state.
  </para>

  <para>
   The state transition function for an ordered-set aggregate receives
   the current state value plus the aggregated input values for
   each row, and returns the updated state value.  This is the
   same definition as for normal aggregates, but note that the direct
   arguments (if any) are not provided.  The final function receives
   the last state value, the values of the direct arguments if any,
   and (if <literal>finalfunc_extra</literal> is specified) null values
   corresponding to the aggregated input(s).  As with normal
   aggregates, <literal>finalfunc_extra</literal> is only really useful if the
   aggregate is polymorphic; then the extra dummy argument(s) are needed
   to connect the final function's result type to the aggregate's input
   type(s).
  </para>

  <para>
   Currently, ordered-set aggregates cannot be used as window functions,
   and therefore there is no need for them to support moving-aggregate mode.
  </para>

 </sect2>

  <sect2 id="xaggr-partial-aggregates">
  <title>Partial Aggregation</title>

  <indexterm>
   <primary>aggregate function</primary>
   <secondary>partial aggregation</secondary>
  </indexterm>

  <para>
   Optionally, an aggregate function can support <firstterm>partial
   aggregation</firstterm>.  The idea of partial aggregation is to run the aggregate's
   state transition function over different subsets of the input data
   independently, and then to combine the state values resulting from those
   subsets to produce the same

Title: Ordered-Set Aggregates and Partial Aggregation
Summary
Ordered-set aggregates require manual sorting of input rows and are typically implemented in C, with the final function performing the sort and possibly injecting hypothetical rows, and they cannot be used as window functions, whereas partial aggregation allows an aggregate function to run over different subsets of the input data independently and then combine the state values to produce the same result as running the aggregate over the entire dataset