Cost Estimation Procedure: Calculating Selectivity, Visited Rows, Pages, Access Cost, and Correlation

<parameter>indexCorrelation</parameter> should be set to the correlation (ranging between -1.0 and 1.0) between the index order and the table order. This is used to adjust the estimate for the cost of fetching rows from the parent table. </para> <para> The <parameter>indexPages</parameter> should be set to the number of leaf pages. This is used to estimate the number of workers for parallel index scan. </para> <para> When <parameter>loop_count</parameter> is greater than one, the returned numbers should be averages expected for any one scan of the index. </para> <procedure> <title>Cost Estimation</title> <para> A typical cost estimator will proceed as follows: </para> <step> <para> Estimate and return the fraction of parent-table rows that will be visited based on the given qual conditions. In the absence of any index-type-specific knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>: <programlisting> *indexSelectivity = clauselist_selectivity(root, path->indexquals, path->indexinfo->rel->relid, JOIN_INNER, NULL); </programlisting> </para> </step> <step> <para> Estimate the number of index rows that will be visited during the scan. For many index types this is the same as <parameter>indexSelectivity</parameter> times the number of rows in the index, but it might be more. (Note that the index's size in pages and rows is available from the <literal>path->indexinfo</literal> struct.) </para> </step> <step> <para> Estimate the number of index pages that will be retrieved during the scan. This might be just <parameter>indexSelectivity</parameter> times the index's size in pages. </para> </step> <step> <para> Compute the index access cost. A generic estimator might do this: <programlisting> /* * Our generic assumption is that the index pages will be read * sequentially, so they cost seq_page_cost each, not random_page_cost. * Also, we charge for evaluation of the indexquals at each index row. * All the costs are assumed to be paid incrementally during the scan. */ cost_qual_eval(&index_qual_cost, path->indexquals, root); *indexStartupCost = index_qual_cost.startup; *indexTotalCost = seq_page_cost * numIndexPages + (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples; </programlisting> However, the above does not account for amortization of index reads across repeated index scans. </para> </step> <step> <para> Estimate the index correlation. For a simple ordered index on a single field, this can be retrieved from pg_statistic. If the correlation is not known, the conservative estimate is zero (no correlation). </para> </step> </procedure> <para> Examples of cost estimator functions can be found in <filename>src/backend/utils/adt/selfuncs.c</filename>. </para> </sect1> </chapter>

This section details a typical cost estimation procedure for indexes. It covers estimating the fraction of visited parent-table rows using clauselist_selectivity(), estimating the number of index rows visited, estimating the number of index pages retrieved, and computing the index access cost, including the cost of evaluating index qualifications. The generic assumption is sequential reading. Also, it involves estimating the index correlation, which is normally based on data found in pg_statistic.