Self Joins and Aggregate Functions in SQL

right-table match, empty (null) values are substituted for the right-table columns. </para> <formalpara> <title>Exercise:</title> <para> There are also right outer joins and full outer joins. Try to find out what those do. </para> </formalpara> <indexterm><primary>join</primary><secondary>self</secondary></indexterm> <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm> <para> We can also join a table against itself. This is called a <firstterm>self join</firstterm>. As an example, suppose we wish to find all the weather records that are in the temperature range of other weather records. So we need to compare the <structfield>temp_lo</structfield> and <structfield>temp_hi</structfield> columns of each <structname>weather</structname> row to the <structfield>temp_lo</structfield> and <structfield>temp_hi</structfield> columns of all other <structname>weather</structname> rows. We can do this with the following query: <programlisting> SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high, w2.city, w2.temp_lo AS low, w2.temp_hi AS high FROM weather w1 JOIN weather w2 ON w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi; </programlisting> <screen> city | low | high | city | low | high ---------------+-----+------+---------------+-----+------ San Francisco | 43 | 57 | San Francisco | 46 | 50 Hayward | 37 | 54 | San Francisco | 46 | 50 (2 rows) </screen> Here we have relabeled the weather table as <literal>w1</literal> and <literal>w2</literal> to be able to distinguish the left and right side of the join. You can also use these kinds of aliases in other queries to save some typing, e.g.: <programlisting> SELECT * FROM weather w JOIN cities c ON w.city = c.name; </programlisting> You will encounter this style of abbreviating quite frequently. </para> </sect1> <sect1 id="tutorial-agg"> <title>Aggregate Functions</title> <indexterm zone="tutorial-agg"> <primary>aggregate function</primary> </indexterm> <para> Like most other relational database products, <productname>PostgreSQL</productname> supports <firstterm>aggregate functions</firstterm>. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the <function>count</function>, <function>sum</function>, <function>avg</function> (average), <function>max</function> (maximum) and <function>min</function> (minimum) over a set of rows. </para> <para> As an example, we can find the highest low-temperature reading anywhere with: <programlisting> SELECT max(temp_lo) FROM weather; </programlisting> <screen> max ----- 46 (1 row) </screen> </para> <para> <indexterm><primary>subquery</primary></indexterm> If we wanted to know what city (or cities) that reading occurred in, we might try: <programlisting> SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>-- WRONG</lineannotation> </programlisting> but this will not work since the aggregate <function>max</function> cannot be used in the <literal>WHERE</literal> clause. (This restriction exists because the <literal>WHERE</literal> clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is often the case the query can be restated to accomplish the desired result, here by using a <firstterm>subquery</firstterm>: <programlisting> SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); </programlisting> <screen> city --------------- San Francisco (1 row) </screen> This is OK because the subquery is an independent computation that computes its own aggregate separately

This section covers self joins, which allow a table to be joined with itself, and aggregate functions, which compute a single result from multiple input rows, such as count, sum, average, max, and min, and also discusses the use of subqueries to filter data based on aggregate calculations.