find out whether it's
visible, so no performance advantage is gained over a standard index
scan. Even in the successful case, this approach trades visibility map
accesses for heap accesses; but since the visibility map is four orders
of magnitude smaller than the heap it describes, far less physical I/O is
needed to access it. In most situations the visibility map remains
cached in memory all the time.
</para>
<para>
In short, while an index-only scan is possible given the two fundamental
requirements, it will be a win only if a significant fraction of the
table's heap pages have their all-visible map bits set. But tables in
which a large fraction of the rows are unchanging are common enough to
make this type of scan very useful in practice.
</para>
<para>
<indexterm>
<primary><literal>INCLUDE</literal></primary>
<secondary>in index definitions</secondary>
</indexterm>
To make effective use of the index-only scan feature, you might choose to
create a <firstterm>covering index</firstterm>, which is an index
specifically designed to include the columns needed by a particular
type of query that you run frequently. Since queries typically need to
retrieve more columns than just the ones they search
on, <productname>PostgreSQL</productname> allows you to create an index
in which some columns are just <quote>payload</quote> and are not part
of the search key. This is done by adding an <literal>INCLUDE</literal>
clause listing the extra columns. For example, if you commonly run
queries like
<programlisting>
SELECT y FROM tab WHERE x = 'key';
</programlisting>
the traditional approach to speeding up such queries would be to create
an index on <literal>x</literal> only. However, an index defined as
<programlisting>
CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
</programlisting>
could handle these queries as index-only scans,
because <literal>y</literal> can be obtained from the index without
visiting the heap.
</para>
<para>
Because column <literal>y</literal> is not part of the index's search
key, it does not have to be of a data type that the index can handle;
it's merely stored in the index and is not interpreted by the index
machinery. Also, if the index is a unique index, that is
<programlisting>
CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
</programlisting>
the uniqueness condition applies to just column <literal>x</literal>,
not to the combination of <literal>x</literal> and <literal>y</literal>.
(An <literal>INCLUDE</literal> clause can also be written
in <literal>UNIQUE</literal> and <literal>PRIMARY KEY</literal>
constraints, providing alternative syntax for setting up an index like
this.)
</para>
<para>
It's wise to be conservative about adding non-key payload columns to an
index, especially wide columns. If an index tuple exceeds the
maximum size allowed for the index type, data insertion will fail.
In any case, non-key columns duplicate data from the index's table
and bloat the size of the index, thus potentially slowing searches.
And remember that there is little point in including payload columns in an
index unless the table changes slowly enough that an index-only scan is
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
included columns, and that only B-tree, GiST and SP-GiST indexes currently
support included columns.
</para>
<para>
Before <productname>PostgreSQL</productname> had
the <literal>INCLUDE</literal> feature, people sometimes made covering
indexes by writing the payload columns as ordinary index columns,
that is writing
<programlisting>
CREATE INDEX tab_x_y ON tab(x, y);
</programlisting>
even though they had no intention of ever using <literal>y</literal>