Home Explore Blog CI



postgresql

25th chunk of `doc/src/sgml/indexam.sgml`
8f341ddeac5a18a0105dcf128fce096b9a654eb902f8771d0000000100000c32
 <parameter>indexCorrelation</parameter> should be set to the correlation (ranging between
   -1.0 and 1.0) between the index order and the table order.  This is used
   to adjust the estimate for the cost of fetching rows from the parent
   table.
  </para>

  <para>
   The <parameter>indexPages</parameter> should be set to the number of leaf pages.
   This is used to estimate the number of workers for parallel index scan.
  </para>

  <para>
   When <parameter>loop_count</parameter> is greater than one, the returned numbers
   should be averages expected for any one scan of the index.
  </para>

  <procedure>
   <title>Cost Estimation</title>
   <para>
    A typical cost estimator will proceed as follows:
   </para>

   <step>
    <para>
     Estimate and return the fraction of parent-table rows that will be visited
     based on the given qual conditions.  In the absence of any index-type-specific
     knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:

<programlisting>
*indexSelectivity = clauselist_selectivity(root, path-&gt;indexquals,
                                           path-&gt;indexinfo-&gt;rel-&gt;relid,
                                           JOIN_INNER, NULL);
</programlisting>
    </para>
   </step>

   <step>
    <para>
     Estimate the number of index rows that will be visited during the
     scan.  For many index types this is the same as <parameter>indexSelectivity</parameter> times
     the number of rows in the index, but it might be more.  (Note that the
     index's size in pages and rows is available from the
     <literal>path-&gt;indexinfo</literal> struct.)
    </para>
   </step>

   <step>
    <para>
     Estimate the number of index pages that will be retrieved during the scan.
     This might be just <parameter>indexSelectivity</parameter> times the index's size in pages.
    </para>
   </step>

   <step>
    <para>
     Compute the index access cost.  A generic estimator might do this:

<programlisting>
/*
 * Our generic assumption is that the index pages will be read
 * sequentially, so they cost seq_page_cost each, not random_page_cost.
 * Also, we charge for evaluation of the indexquals at each index row.
 * All the costs are assumed to be paid incrementally during the scan.
 */
cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
*indexStartupCost = index_qual_cost.startup;
*indexTotalCost = seq_page_cost * numIndexPages +
    (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
</programlisting>

     However, the above does not account for amortization of index reads
     across repeated index scans.
    </para>
   </step>

   <step>
    <para>
     Estimate the index correlation.  For a simple ordered index on a single
     field, this can be retrieved from pg_statistic.  If the correlation
     is not known, the conservative estimate is zero (no correlation).
    </para>
   </step>
  </procedure>

  <para>
   Examples of cost estimator functions can be found in
   <filename>src/backend/utils/adt/selfuncs.c</filename>.
  </para>
 </sect1>
</chapter>

Title: Cost Estimation Procedure: Calculating Selectivity, Visited Rows, Pages, Access Cost, and Correlation
Summary
This section details a typical cost estimation procedure for indexes. It covers estimating the fraction of visited parent-table rows using clauselist_selectivity(), estimating the number of index rows visited, estimating the number of index pages retrieved, and computing the index access cost, including the cost of evaluating index qualifications. The generic assumption is sequential reading. Also, it involves estimating the index correlation, which is normally based on data found in pg_statistic.