Home Explore Blog CI



postgresql

18th chunk of `doc/src/sgml/indices.sgml`
45596a09291b852c30aa6b5b7efb89de957051d9738498440000000100000fb1
 find out whether it's
   visible, so no performance advantage is gained over a standard index
   scan.  Even in the successful case, this approach trades visibility map
   accesses for heap accesses; but since the visibility map is four orders
   of magnitude smaller than the heap it describes, far less physical I/O is
   needed to access it.  In most situations the visibility map remains
   cached in memory all the time.
  </para>

  <para>
   In short, while an index-only scan is possible given the two fundamental
   requirements, it will be a win only if a significant fraction of the
   table's heap pages have their all-visible map bits set.  But tables in
   which a large fraction of the rows are unchanging are common enough to
   make this type of scan very useful in practice.
  </para>

  <para>
   <indexterm>
    <primary><literal>INCLUDE</literal></primary>
    <secondary>in index definitions</secondary>
   </indexterm>
   To make effective use of the index-only scan feature, you might choose to
   create a <firstterm>covering index</firstterm>, which is an index
   specifically designed to include the columns needed by a particular
   type of query that you run frequently.  Since queries typically need to
   retrieve more columns than just the ones they search
   on, <productname>PostgreSQL</productname> allows you to create an index
   in which some columns are just <quote>payload</quote> and are not part
   of the search key.  This is done by adding an <literal>INCLUDE</literal>
   clause listing the extra columns.  For example, if you commonly run
   queries like
<programlisting>
SELECT y FROM tab WHERE x = 'key';
</programlisting>
   the traditional approach to speeding up such queries would be to create
   an index on <literal>x</literal> only.  However, an index defined as
<programlisting>
CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
</programlisting>
   could handle these queries as index-only scans,
   because <literal>y</literal> can be obtained from the index without
   visiting the heap.
  </para>

  <para>
   Because column <literal>y</literal> is not part of the index's search
   key, it does not have to be of a data type that the index can handle;
   it's merely stored in the index and is not interpreted by the index
   machinery.  Also, if the index is a unique index, that is
<programlisting>
CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
</programlisting>
   the uniqueness condition applies to just column <literal>x</literal>,
   not to the combination of <literal>x</literal> and <literal>y</literal>.
   (An <literal>INCLUDE</literal> clause can also be written
   in <literal>UNIQUE</literal> and <literal>PRIMARY KEY</literal>
   constraints, providing alternative syntax for setting up an index like
   this.)
  </para>

  <para>
   It's wise to be conservative about adding non-key payload columns to an
   index, especially wide columns.  If an index tuple exceeds the
   maximum size allowed for the index type, data insertion will fail.
   In any case, non-key columns duplicate data from the index's table
   and bloat the size of the index, thus potentially slowing searches.
   And remember that there is little point in including payload columns in an
   index unless the table changes slowly enough that an index-only scan is
   likely to not need to access the heap.  If the heap tuple must be visited
   anyway, it costs nothing more to get the column's value from there.
   Other restrictions are that expressions are not currently supported as
   included columns, and that only B-tree, GiST and SP-GiST indexes currently
   support included columns.
  </para>

  <para>
   Before <productname>PostgreSQL</productname> had
   the <literal>INCLUDE</literal> feature, people sometimes made covering
   indexes by writing the payload columns as ordinary index columns,
   that is writing
<programlisting>
CREATE INDEX tab_x_y ON tab(x, y);
</programlisting>
   even though they had no intention of ever using <literal>y</literal>

Title: Covering Indexes with INCLUDE for Index-Only Scans
Summary
To effectively utilize index-only scans, create covering indexes that include columns needed by frequent queries. PostgreSQL allows adding columns as 'payload' using the INCLUDE clause, so they are not part of the search key but are stored in the index. This lets queries retrieve data without heap access. However, be cautious about adding wide or numerous non-key columns, as it can increase index size and slow searches. INCLUDE is beneficial only when the table changes slowly enough to avoid heap access during scans. Only B-tree, GiST, and SP-GiST indexes support included columns. Prior to the INCLUDE feature, covering indexes were created by listing payload columns as ordinary index columns, but this is generally less efficient.