Future Development and Further Reading of GEQO

of more-fit candidates — that is, by using randomly-chosen portions of known low-cost join sequences to create new sequences for consideration. This process is repeated until a preset number of join sequences have been considered; then the best one found at any time during the search is used to generate the finished plan. </para> <para> This process is inherently nondeterministic, because of the randomized choices made during both the initial population selection and subsequent <quote>mutation</quote> of the best candidates. To avoid surprising changes of the selected plan, each run of the GEQO algorithm restarts its random number generator with the current <xref linkend="guc-geqo-seed"/> parameter setting. As long as <varname>geqo_seed</varname> and the other GEQO parameters are kept fixed, the same plan will be generated for a given query (and other planner inputs such as statistics). To experiment with different search paths, try changing <varname>geqo_seed</varname>. </para> </sect2> <sect2 id="geqo-future"> <title>Future Implementation Tasks for <productname>PostgreSQL</productname> <acronym>GEQO</acronym></title> <para> Work is still needed to improve the genetic algorithm parameter settings. In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>, routines <function>gimme_pool_size</function> and <function>gimme_number_generations</function>, we have to find a compromise for the parameter settings to satisfy two competing demands: <itemizedlist spacing="compact"> <listitem> <para> Optimality of the query plan </para> </listitem> <listitem> <para> Computing time </para> </listitem> </itemizedlist> </para> <para> In the current implementation, the fitness of each candidate join sequence is estimated by running the standard planner's join selection and cost estimation code from scratch. To the extent that different candidates use similar sub-sequences of joins, a great deal of work will be repeated. This could be made significantly faster by retaining cost estimates for sub-joins. The problem is to avoid expending unreasonable amounts of memory on retaining that state. </para> <para> At a more basic level, it is not clear that solving query optimization with a GA algorithm designed for TSP is appropriate. In the TSP case, the cost associated with any substring (partial tour) is independent of the rest of the tour, but this is certainly not true for query optimization. Thus it is questionable whether edge recombination crossover is the most effective mutation procedure. </para> </sect2> </sect1> <sect1 id="geqo-biblio"> <title>Further Reading</title> <para> The following resources contain additional information about genetic algorithms: <itemizedlist> <listitem> <para> <ulink url="http://www.faqs.org/faqs/ai-faq/genetic/part1/"> The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink url="news://comp.ai.genetic"></ulink>) </para> </listitem> <listitem> <para> <ulink url="https://www.red3d.com/cwr/evolve.html"> Evolutionary Computation and its application to art and design</ulink>, by Craig Reynolds </para> </listitem> <listitem> <para> <xref linkend="elma04"/> </para> </listitem> <listitem> <para> <xref linkend="fong"/> </para> </listitem> </itemizedlist> </para> </sect1> </chapter>

The genetic algorithm used in GEQO requires further improvement, including optimizing parameter settings to balance query plan optimality and computing time, reducing repeated work in estimating candidate join sequences, and potentially reconsidering the use of a GA algorithm designed for TSP, with additional resources available for further reading on genetic algorithms and their applications.