Covering Indexes with INCLUDE for Index-Only Scans

find out whether it's visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time. </para> <para> In short, while an index-only scan is possible given the two fundamental requirements, it will be a win only if a significant fraction of the table's heap pages have their all-visible map bits set. But tables in which a large fraction of the rows are unchanging are common enough to make this type of scan very useful in practice. </para> <para> <indexterm> <primary><literal>INCLUDE</literal></primary> <secondary>in index definitions</secondary> </indexterm> To make effective use of the index-only scan feature, you might choose to create a <firstterm>covering index</firstterm>, which is an index specifically designed to include the columns needed by a particular type of query that you run frequently. Since queries typically need to retrieve more columns than just the ones they search on, <productname>PostgreSQL</productname> allows you to create an index in which some columns are just <quote>payload</quote> and are not part of the search key. This is done by adding an <literal>INCLUDE</literal> clause listing the extra columns. For example, if you commonly run queries like <programlisting> SELECT y FROM tab WHERE x = 'key'; </programlisting> the traditional approach to speeding up such queries would be to create an index on <literal>x</literal> only. However, an index defined as <programlisting> CREATE INDEX tab_x_y ON tab(x) INCLUDE (y); </programlisting> could handle these queries as index-only scans, because <literal>y</literal> can be obtained from the index without visiting the heap. </para> <para> Because column <literal>y</literal> is not part of the index's search key, it does not have to be of a data type that the index can handle; it's merely stored in the index and is not interpreted by the index machinery. Also, if the index is a unique index, that is <programlisting> CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y); </programlisting> the uniqueness condition applies to just column <literal>x</literal>, not to the combination of <literal>x</literal> and <literal>y</literal>. (An <literal>INCLUDE</literal> clause can also be written in <literal>UNIQUE</literal> and <literal>PRIMARY KEY</literal> constraints, providing alternative syntax for setting up an index like this.) </para> <para> It's wise to be conservative about adding non-key payload columns to an index, especially wide columns. If an index tuple exceeds the maximum size allowed for the index type, data insertion will fail. In any case, non-key columns duplicate data from the index's table and bloat the size of the index, thus potentially slowing searches. And remember that there is little point in including payload columns in an index unless the table changes slowly enough that an index-only scan is likely to not need to access the heap. If the heap tuple must be visited anyway, it costs nothing more to get the column's value from there. Other restrictions are that expressions are not currently supported as included columns, and that only B-tree, GiST and SP-GiST indexes currently support included columns. </para> <para> Before <productname>PostgreSQL</productname> had the <literal>INCLUDE</literal> feature, people sometimes made covering indexes by writing the payload columns as ordinary index columns, that is writing <programlisting> CREATE INDEX tab_x_y ON tab(x, y); </programlisting> even though they had no intention of ever using <literal>y</literal>

To effectively utilize index-only scans, create covering indexes that include columns needed by frequent queries. PostgreSQL allows adding columns as 'payload' using the INCLUDE clause, so they are not part of the search key but are stored in the index. This lets queries retrieve data without heap access. However, be cautious about adding wide or numerous non-key columns, as it can increase index size and slow searches. INCLUDE is beneficial only when the table changes slowly enough to avoid heap access during scans. Only B-tree, GiST, and SP-GiST indexes support included columns. Prior to the INCLUDE feature, covering indexes were created by listing payload columns as ordinary index columns, but this is generally less efficient.