Home Explore Blog CI



postgresql

17th chunk of `doc/src/sgml/indices.sgml`
9114df1849eb4bc58199afa3f50e9b935a40f49d8921c3550000000100000fa2

   supports <firstterm>index-only scans</firstterm>, which can answer
   queries from an index alone without any heap access.  The basic idea is
   to return values directly out of each index entry instead of consulting
   the associated heap entry.  There are two fundamental restrictions on
   when this method can be used:

   <orderedlist>
    <listitem>
     <para>
      The index type must support index-only scans.  B-tree indexes always
      do.  GiST and SP-GiST indexes support index-only scans for some
      operator classes but not others.  Other index types have no support.
      The underlying requirement is that the index must physically store, or
      else be able to reconstruct, the original data value for each index
      entry.  As a counterexample, GIN indexes cannot support index-only
      scans because each index entry typically holds only part of the
      original data value.
     </para>
    </listitem>

    <listitem>
     <para>
      The query must reference only columns stored in the index.  For
      example, given an index on columns <literal>x</literal>
      and <literal>y</literal> of a table that also has a
      column <literal>z</literal>, these queries could use index-only scans:
<programlisting>
SELECT x, y FROM tab WHERE x = 'key';
SELECT x FROM tab WHERE x = 'key' AND y &lt; 42;
</programlisting>
      but these queries could not:
<programlisting>
SELECT x, z FROM tab WHERE x = 'key';
SELECT x FROM tab WHERE x = 'key' AND z &lt; 42;
</programlisting>
      (Expression indexes and partial indexes complicate this rule,
      as discussed below.)
     </para>
    </listitem>
   </orderedlist>
  </para>

  <para>
   If these two fundamental requirements are met, then all the data values
   required by the query are available from the index, so an index-only scan
   is physically possible.  But there is an additional requirement for any
   table scan in <productname>PostgreSQL</productname>: it must verify that
   each retrieved row be <quote>visible</quote> to the query's MVCC
   snapshot, as discussed in <xref linkend="mvcc"/>.  Visibility information
   is not stored in index entries, only in heap entries; so at first glance
   it would seem that every row retrieval would require a heap access
   anyway.  And this is indeed the case, if the table row has been modified
   recently.  However, for seldom-changing data there is a way around this
   problem.  <productname>PostgreSQL</productname> tracks, for each page in
   a table's heap, whether all rows stored in that page are old enough to be
   visible to all current and future transactions.  This information is
   stored in a bit in the table's <firstterm>visibility map</firstterm>.  An
   index-only scan, after finding a candidate index entry, checks the
   visibility map bit for the corresponding heap page.  If it's set, the row
   is known visible and so the data can be returned with no further work.
   If it's not set, the heap entry must be visited to find out whether it's
   visible, so no performance advantage is gained over a standard index
   scan.  Even in the successful case, this approach trades visibility map
   accesses for heap accesses; but since the visibility map is four orders
   of magnitude smaller than the heap it describes, far less physical I/O is
   needed to access it.  In most situations the visibility map remains
   cached in memory all the time.
  </para>

  <para>
   In short, while an index-only scan is possible given the two fundamental
   requirements, it will be a win only if a significant fraction of the
   table's heap pages have their all-visible map bits set.  But tables in
   which a large fraction of the rows are unchanging are common enough to
   make this type of scan very useful in practice.
  </para>

  <para>
   <indexterm>
    <primary><literal>INCLUDE</literal></primary>
    <secondary>in index definitions</secondary>
   </indexterm>
   To make effective use of the index-only scan feature, you

Title: Index-Only Scan Requirements and Visibility Considerations
Summary
Index-only scans in PostgreSQL require the index type to support them (e.g., B-tree indexes) and the query to reference only columns stored in the index. Additionally, PostgreSQL verifies row visibility using MVCC snapshots. While visibility information is stored in heap entries, PostgreSQL uses a visibility map to track pages where all rows are old enough to be visible to all transactions. An index-only scan checks this map, and if the corresponding bit is set, the data can be returned without accessing the heap. This method provides a performance advantage if a significant portion of the table's heap pages have their all-visible map bits set.