Using Aggregate Functions with Subqueries, GROUP BY, and HAVING

cities) that reading occurred in, we might try: <programlisting> SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>-- WRONG</lineannotation> </programlisting> but this will not work since the aggregate <function>max</function> cannot be used in the <literal>WHERE</literal> clause. (This restriction exists because the <literal>WHERE</literal> clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is often the case the query can be restated to accomplish the desired result, here by using a <firstterm>subquery</firstterm>: <programlisting> SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); </programlisting> <screen> city --------------- San Francisco (1 row) </screen> This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in the outer query. </para> <para> <indexterm><primary>GROUP BY</primary></indexterm> <indexterm><primary>HAVING</primary></indexterm> Aggregates are also very useful in combination with <literal>GROUP BY</literal> clauses. For example, we can get the number of readings and the maximum low temperature observed in each city with: <programlisting> SELECT city, count(*), max(temp_lo) FROM weather GROUP BY city; </programlisting> <screen> city | count | max ---------------+-------+----- Hayward | 1 | 37 San Francisco | 2 | 46 (2 rows) </screen> which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these grouped rows using <literal>HAVING</literal>: <programlisting> SELECT city, count(*), max(temp_lo) FROM weather GROUP BY city HAVING max(temp_lo) < 40; </programlisting> <screen> city | count | max ---------+-------+----- Hayward | 1 | 37 (1 row) </screen> which gives us the same results for only the cities that have all <structfield>temp_lo</structfield> values below 40. Finally, if we only care about cities whose names begin with <quote><literal>S</literal></quote>, we might do: <programlisting> SELECT city, count(*), max(temp_lo) FROM weather WHERE city LIKE 'S%' -- <co id="co.tutorial-agg-like"/> GROUP BY city; </programlisting> <screen> city | count | max ---------------+-------+----- San Francisco | 2 | 46 (1 row) </screen> <calloutlist> <callout arearefs="co.tutorial-agg-like"> <para> The <literal>LIKE</literal> operator does pattern matching and is explained in <xref linkend="functions-matching"/>. </para> </callout> </calloutlist> </para> <para> It is important to understand the interaction between aggregates and <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses. The fundamental difference between <literal>WHERE</literal> and <literal>HAVING</literal> is this: <literal>WHERE</literal> selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas <literal>HAVING</literal> selects group rows after groups and aggregates are computed. Thus, the <literal>WHERE</literal> clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the <literal>HAVING</literal> clause always

This section explains how to use aggregate functions in combination with subqueries, GROUP BY clauses, and HAVING clauses to filter and analyze data, including how to compute aggregate values for each group, filter groups based on conditions, and understand the difference between WHERE and HAVING clauses in SQL queries.