Genetic Query Optimization in PostgreSQL

individual in the search space are represented by <firstterm>chromosomes</firstterm>, in essence a set of character strings. A <firstterm>gene</firstterm> is a subsection of a chromosome which encodes the value of a single parameter being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or <firstterm>integer</firstterm>. </para> <para> Through simulation of the evolutionary operations <firstterm>recombination</firstterm>, <firstterm>mutation</firstterm>, and <firstterm>selection</firstterm> new generations of search points are found that show a higher average fitness than their ancestors. <xref linkend="geqo-figure"/> illustrates these steps. </para> <figure id="geqo-figure"> <title>Structure of a Genetic Algorithm</title> <mediaobject> <imageobject> <imagedata fileref="images/genetic-algorithm.svg" format="SVG" width="100%"/> </imageobject> </mediaobject> </figure> <para> According to the <systemitem class="resource">comp.ai.genetic</systemitem> <acronym>FAQ</acronym> it cannot be stressed too strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly non-random (better than random). </para> </sect1> <sect1 id="geqo-pg-intro"> <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title> <para> The <acronym>GEQO</acronym> module approaches the query optimization problem as though it were the well-known traveling salesman problem (<acronym>TSP</acronym>). Possible query plans are encoded as integer strings. Each string represents the join order from one relation of the query to the next. For example, the join tree <literallayout class="monospaced"> /\ /\ 2 /\ 3 4 1 </literallayout> is encoded by the integer string '4-1-3-2', which means, first join relation '4' and '1', then '3', and then '2', where 1, 2, 3, 4 are relation IDs within the <productname>PostgreSQL</productname> optimizer. </para> <para> Specific characteristics of the <acronym>GEQO</acronym> implementation in <productname>PostgreSQL</productname> are: <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit individuals in a population, not whole-generational replacement) allows fast convergence towards improved query plans. This is essential for query handling with reasonable time; </para> </listitem> <listitem> <para> Usage of <firstterm>edge recombination crossover</firstterm> which is especially suited to keep edge losses low for the solution of the <acronym>TSP</acronym> by means of a <acronym>GA</acronym>; </para> </listitem> <listitem> <para> Mutation as genetic operator is deprecated so that no repair mechanisms are needed to generate legal <acronym>TSP</acronym> tours. </para> </listitem> </itemizedlist> </para> <para> Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor algorithm. </para> <para> The <acronym>GEQO</acronym> module allows the <productname>PostgreSQL</productname> query optimizer to support large join queries effectively through non-exhaustive search. </para> <sect2 id="geqo-pg-intro-gen-possible-plans"> <title>Generating Possible Plans with <acronym>GEQO</acronym></title> <para> The <acronym>GEQO</acronym> planning process uses the standard planner code to generate plans for scans of individual relations. Then join plans are developed using the genetic approach. As shown above, each candidate join plan is represented by a sequence in which to join

The Genetic Query Optimization (GEQO) module in PostgreSQL uses a genetic algorithm to efficiently optimize large join queries by representing possible query plans as integer strings and using evolutionary operations to find improved plans.