More Tips for Index Experimentation and Troubleshooting

default values are assumed, which are almost certain to be inaccurate. Examining an application's index usage without having run <command>ANALYZE</command> is therefore a lost cause. See <xref linkend="vacuum-for-statistics"/> and <xref linkend="autovacuum"/> for more information. </para> </listitem> <listitem> <para> Use real data for experimentation. Using test data for setting up indexes will tell you what indexes you need for the test data, but that is all. </para> <para> It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page. </para> <para> Also be careful when making up test data, which is often unavoidable when the application is not yet in production. Values that are very similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have. </para> </listitem> <listitem> <para> When indexes are not used, it can be useful for testing to force their use. There are run-time parameters that can turn off various plan types (see <xref linkend="runtime-config-query-enable"/>). For instance, turning off sequential scans (<varname>enable_seqscan</varname>) and nested-loop joins (<varname>enable_nestloop</varname>), which are the most basic plans, will force the system to use a different plan. If the system still chooses a sequential scan or nested-loop join then there is probably a more fundamental reason why the index is not being used; for example, the query condition does not match the index. (What kind of query can use what kind of index is explained in the previous sections.) </para> </listitem> <listitem> <para> If forcing index usage does use the index, then there are two possibilities: Either the system is right and using the index is indeed not appropriate, or the cost estimates of the query plans are not reflecting reality. So you should time your query with and without indexes. The <command>EXPLAIN ANALYZE</command> command can be useful here. </para> </listitem> <listitem> <para> If it turns out that the cost estimates are wrong, there are, again, two possibilities. The total cost is computed from the per-row costs of each plan node times the selectivity estimate of the plan node. The costs estimated for the plan nodes can be adjusted via run-time parameters (described in <xref linkend="runtime-config-query-constants"/>). An inaccurate selectivity estimate is due to insufficient statistics. It might be possible to improve this by tuning the statistics-gathering parameters (see <xref linkend="sql-altertable"/>). </para> <para> If you do not succeed in adjusting the costs to be more appropriate, then you might have to resort to forcing index usage explicitly. You might also want to contact the <productname>PostgreSQL</productname> developers to examine the issue. </para> </listitem> </itemizedlist> </sect1> </chapter>

When test data is unavoidable, be careful to not use values that are too similar, completely random, or inserted in sorted order as this skews the statistics. If indexes aren't being used, force their use by disabling sequential scans and nested-loop joins, then examine the resulting plan. Compare query times with and without indexes using EXPLAIN ANALYZE. If cost estimates are wrong, adjust plan node costs via run-time parameters or improve selectivity estimates by tuning statistics-gathering parameters.