Optimizing Many-to-Many Table Queries with Integer Aggregator

<function>int_array_enum(integer[])</function> that returns <type>setof integer</type>. It is essentially the reverse operation of the aggregator: given an array of integers, expand it into a set of rows. This is a wrapper around <function>unnest</function>, which does the same thing for any array type. </para> </sect2> <sect2 id="intagg-samples"> <title>Sample Uses</title> <para> Many database systems have the notion of a many to many table. Such a table usually sits between two indexed tables, for example: <programlisting> CREATE TABLE left_table (id INT PRIMARY KEY, ...); CREATE TABLE right_table (id INT PRIMARY KEY, ...); CREATE TABLE many_to_many(id_left INT REFERENCES left_table, id_right INT REFERENCES right_table); </programlisting> It is typically used like this: <programlisting> SELECT right_table.* FROM right_table JOIN many_to_many ON (right_table.id = many_to_many.id_right) WHERE many_to_many.id_left = <replaceable>item</replaceable>; </programlisting> This will return all the items in the right hand table for an entry in the left hand table. This is a very common construct in SQL. </para> <para> Now, this methodology can be cumbersome with a very large number of entries in the <structname>many_to_many</structname> table. Often, a join like this would result in an index scan and a fetch for each right hand entry in the table for a particular left hand entry. If you have a very dynamic system, there is not much you can do. However, if you have some data which is fairly static, you can create a summary table with the aggregator. <programlisting> CREATE TABLE summary AS SELECT id_left, int_array_aggregate(id_right) AS rights FROM many_to_many GROUP BY id_left; </programlisting> This will create a table with one row per left item, and an array of right items. Now this is pretty useless without some way of using the array; that's why there is an array enumerator. You can do <programlisting> SELECT id_left, int_array_enum(rights) FROM summary WHERE id_left = <replaceable>item</replaceable>; </programlisting> The above query using <function>int_array_enum</function> produces the same results as <programlisting> SELECT id_left, id_right FROM many_to_many WHERE id_left = <replaceable>item</replaceable>; </programlisting> The difference is that the query against the summary table has to get only one row from the table, whereas the direct query against <structname>many_to_many</structname> must index scan and fetch a row for each entry. </para> <para> On one system, an <command>EXPLAIN</command> showed a query with a cost of 8488 was reduced to a cost of 329. The original query was a join involving the <structname>many_to_many</structname> table, which was replaced by: <programlisting> SELECT id_right, count(id_right) FROM ( SELECT id_left, int_array_enum(rights) AS id_right FROM summary JOIN (SELECT id FROM left_table WHERE id = <replaceable>item</replaceable>) AS lefts ON (summary.id_left = lefts.id) ) AS list GROUP BY id_right ORDER BY count DESC; </programlisting> </para> </sect2> </sect1>

The integer aggregator and enumerator functions can be used to optimize queries involving many-to-many tables by creating a summary table with aggregated array values, reducing the need for index scans and fetches, and resulting in significant performance improvements.