Home Explore Blog CI



postgresql

4th chunk of `doc/src/sgml/geqo.sgml`
3f672ef95db5bd1a8d2a9b965746007860c94456f8c26f850000000100000eb9
 of more-fit candidates — that is, by using
    randomly-chosen portions of known low-cost join sequences to create
    new sequences for consideration.  This process is repeated until a
    preset number of join sequences have been considered; then the best
    one found at any time during the search is used to generate the finished
    plan.
   </para>

   <para>
    This process is inherently nondeterministic, because of the randomized
    choices made during both the initial population selection and subsequent
    <quote>mutation</quote> of the best candidates.  To avoid surprising changes
    of the selected plan, each run of the GEQO algorithm restarts its
    random number generator with the current <xref linkend="guc-geqo-seed"/>
    parameter setting.  As long as <varname>geqo_seed</varname> and the other
    GEQO parameters are kept fixed, the same plan will be generated for a
    given query (and other planner inputs such as statistics).  To experiment
    with different search paths, try changing <varname>geqo_seed</varname>.
   </para>

  </sect2>

  <sect2 id="geqo-future">
   <title>Future Implementation Tasks for
    <productname>PostgreSQL</productname> <acronym>GEQO</acronym></title>

     <para>
      Work is still needed to improve the genetic algorithm parameter
      settings.
      In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
      routines
      <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
      we have to find a compromise for the parameter settings
      to satisfy two competing demands:
      <itemizedlist spacing="compact">
       <listitem>
        <para>
         Optimality of the query plan
        </para>
       </listitem>
       <listitem>
        <para>
         Computing time
        </para>
       </listitem>
      </itemizedlist>
     </para>

     <para>
      In the current implementation, the fitness of each candidate join
      sequence is estimated by running the standard planner's join selection
      and cost estimation code from scratch.  To the extent that different
      candidates use similar sub-sequences of joins, a great deal of work
      will be repeated.  This could be made significantly faster by retaining
      cost estimates for sub-joins.  The problem is to avoid expending
      unreasonable amounts of memory on retaining that state.
     </para>

     <para>
      At a more basic level, it is not clear that solving query optimization
      with a GA algorithm designed for TSP is appropriate.  In the TSP case,
      the cost associated with any substring (partial tour) is independent
      of the rest of the tour, but this is certainly not true for query
      optimization.  Thus it is questionable whether edge recombination
      crossover is the most effective mutation procedure.
     </para>

   </sect2>
  </sect1>

 <sect1 id="geqo-biblio">
  <title>Further Reading</title>

  <para>
   The following resources contain additional information about
   genetic algorithms:

   <itemizedlist>
    <listitem>
     <para>
      <ulink url="http://www.faqs.org/faqs/ai-faq/genetic/part1/">
      The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
      url="news://comp.ai.genetic"></ulink>)
     </para>
    </listitem>

    <listitem>
     <para>
      <ulink url="https://www.red3d.com/cwr/evolve.html">
      Evolutionary Computation and its application to art and design</ulink>, by
      Craig Reynolds
     </para>
    </listitem>

    <listitem>
     <para>
      <xref linkend="elma04"/>
     </para>
    </listitem>

    <listitem>
     <para>
      <xref linkend="fong"/>
     </para>
    </listitem>
   </itemizedlist>
  </para>

 </sect1>
</chapter>

Title: Future Development and Further Reading of GEQO
Summary
The genetic algorithm used in GEQO requires further improvement, including optimizing parameter settings to balance query plan optimality and computing time, reducing repeated work in estimating candidate join sequences, and potentially reconsidering the use of a GA algorithm designed for TSP, with additional resources available for further reading on genetic algorithms and their applications.