Index-Only Scan Requirements and Visibility Considerations

supports <firstterm>index-only scans</firstterm>, which can answer queries from an index alone without any heap access. The basic idea is to return values directly out of each index entry instead of consulting the associated heap entry. There are two fundamental restrictions on when this method can be used: <orderedlist> <listitem> <para> The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value. </para> </listitem> <listitem> <para> The query must reference only columns stored in the index. For example, given an index on columns <literal>x</literal> and <literal>y</literal> of a table that also has a column <literal>z</literal>, these queries could use index-only scans: <programlisting> SELECT x, y FROM tab WHERE x = 'key'; SELECT x FROM tab WHERE x = 'key' AND y < 42; </programlisting> but these queries could not: <programlisting> SELECT x, z FROM tab WHERE x = 'key'; SELECT x FROM tab WHERE x = 'key' AND z < 42; </programlisting> (Expression indexes and partial indexes complicate this rule, as discussed below.) </para> </listitem> </orderedlist> </para> <para> If these two fundamental requirements are met, then all the data values required by the query are available from the index, so an index-only scan is physically possible. But there is an additional requirement for any table scan in <productname>PostgreSQL</productname>: it must verify that each retrieved row be <quote>visible</quote> to the query's MVCC snapshot, as discussed in <xref linkend="mvcc"/>. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway. And this is indeed the case, if the table row has been modified recently. However, for seldom-changing data there is a way around this problem. <productname>PostgreSQL</productname> tracks, for each page in a table's heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table's <firstterm>visibility map</firstterm>. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it's set, the row is known visible and so the data can be returned with no further work. If it's not set, the heap entry must be visited to find out whether it's visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time. </para> <para> In short, while an index-only scan is possible given the two fundamental requirements, it will be a win only if a significant fraction of the table's heap pages have their all-visible map bits set. But tables in which a large fraction of the rows are unchanging are common enough to make this type of scan very useful in practice. </para> <para> <indexterm> <primary><literal>INCLUDE</literal></primary> <secondary>in index definitions</secondary> </indexterm> To make effective use of the index-only scan feature, you

Index-only scans in PostgreSQL require the index type to support them (e.g., B-tree indexes) and the query to reference only columns stored in the index. Additionally, PostgreSQL verifies row visibility using MVCC snapshots. While visibility information is stored in heap entries, PostgreSQL uses a visibility map to track pages where all rows are old enough to be visible to all transactions. An index-only scan checks this map, and if the corresponding bit is set, the data can be returned without accessing the heap. This method provides a performance advantage if a significant portion of the table's heap pages have their all-visible map bits set.