SQL Grouping and Aggregate Functions

aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in <xref linkend="functions-aggregate"/>. </para> <tip> <para> Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the <literal>DISTINCT</literal> clause (see <xref linkend="queries-distinct"/>). </para> </tip> <para> Here is another example: it calculates the total sales for each product (rather than the total sales of all products): <programlisting> SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) GROUP BY product_id, p.name, p.price; </programlisting> In this example, the columns <literal>product_id</literal>, <literal>p.name</literal>, and <literal>p.price</literal> must be in the <literal>GROUP BY</literal> clause since they are referenced in the query select list (but see below). The column <literal>s.units</literal> does not have to be in the <literal>GROUP BY</literal> list since it is only used in an aggregate expression (<literal>sum(...)</literal>), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product. </para> <indexterm><primary>functional dependency</primary></indexterm> <para> If the products table is set up so that, say, <literal>product_id</literal> is the primary key, then it would be enough to group by <literal>product_id</literal> in the above example, since name and price would be <firstterm>functionally dependent</firstterm> on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group. </para> <para> In strict SQL, <literal>GROUP BY</literal> can only group by columns of the source table but <productname>PostgreSQL</productname> extends this to also allow <literal>GROUP BY</literal> to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed. </para> <indexterm> <primary>HAVING</primary> </indexterm> <para> If a table has been grouped using <literal>GROUP BY</literal>, but only certain groups are of interest, the <literal>HAVING</literal> clause can be used, much like a <literal>WHERE</literal> clause, to eliminate groups from the result. The syntax is: <synopsis> SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable> </synopsis> Expressions in the <literal>HAVING</literal> clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function). </para> <para> Example: <screen> <prompt>=></prompt> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;</userinput> x | sum ---+----- a | 4 b | 5 (2 rows) <prompt>=></prompt> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';</userinput> x | sum ---+----- a | 4 b | 5 (2 rows) </screen> </para> <para> Again, a more realistic example: <programlisting> SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit FROM products p LEFT JOIN sales s USING (product_id) WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks' GROUP BY product_id, p.name, p.price, p.cost HAVING sum(p.price * s.units) > 5000; </programlisting> In the example above, the <literal>WHERE</literal> clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the <literal>HAVING</literal> clause restricts the output to groups with total gross sales over 5000. Note that

This section delves deeper into SQL's GROUP BY clause and its interaction with aggregate functions. It explains that GROUP BY is used to group rows with identical values in specified columns, allowing for aggregate calculations on these groups. The text provides examples of using GROUP BY with aggregate functions like SUM, and discusses how non-grouped columns can only be used in aggregate expressions. It also introduces the concept of functional dependency in grouping and mentions PostgreSQL's extension of the GROUP BY clause. The HAVING clause is explained as a way to filter grouped results, similar to the WHERE clause but applied after grouping. Several examples demonstrate the practical use of GROUP BY and HAVING in complex queries, including calculating sales and profits for products.