Home Explore Blog CI



postgresql

23th chunk of `doc/src/sgml/perform.sgml`
37f0cd3c7631f881c13342766732e4dfc1f83a0065f4b6500000000100000fa0
 functionally dependent on
     column <structfield>a</structfield> if knowledge of the value of
     <structfield>a</structfield> is sufficient to determine the value
     of <structfield>b</structfield>, that is there are no two rows having the same value
     of <structfield>a</structfield> but different values of <structfield>b</structfield>.
     In a fully normalized database, functional dependencies should exist
     only on primary keys and superkeys. However, in practice many data sets
     are not fully normalized for various reasons; intentional
     denormalization for performance reasons is a common example.
     Even in a fully normalized database, there may be partial correlation
     between some columns, which can be expressed as partial functional
     dependency.
    </para>

    <para>
     The existence of functional dependencies directly affects the accuracy
     of estimates in certain queries.  If a query contains conditions on
     both the independent and the dependent column(s), the
     conditions on the dependent columns do not further reduce the result
     size; but without knowledge of the functional dependency, the query
     planner will assume that the conditions are independent, resulting
     in underestimating the result size.
    </para>

    <para>
     To inform the planner about functional dependencies, <command>ANALYZE</command>
     can collect measurements of cross-column dependency. Assessing the
     degree of dependency between all sets of columns would be prohibitively
     expensive, so data collection is limited to those groups of columns
     appearing together in a statistics object defined with
     the <literal>dependencies</literal> option.  It is advisable to create
     <literal>dependencies</literal> statistics only for column groups that are
     strongly correlated, to avoid unnecessary overhead in both
     <command>ANALYZE</command> and later query planning.
    </para>

    <para>
     Here is an example of collecting functional-dependency statistics:
<programlisting>
CREATE STATISTICS stts (dependencies) ON city, zip FROM zipcodes;

ANALYZE zipcodes;

SELECT stxname, stxkeys, stxddependencies
  FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid)
  WHERE stxname = 'stts';
 stxname | stxkeys |             stxddependencies
---------+---------+------------------------------------------
 stts    | 1 5     | {"1 => 5": 1.000000, "5 => 1": 0.423130}
(1 row)
</programlisting>
     Here it can be seen that column 1 (zip code) fully determines column
     5 (city) so the coefficient is 1.0, while city only determines zip code
     about 42% of the time, meaning that there are many cities (58%) that are
     represented by more than a single ZIP code.
    </para>

    <para>
     When computing the selectivity for a query involving functionally
     dependent columns, the planner adjusts the per-condition selectivity
     estimates using the dependency coefficients so as not to produce
     an underestimate.
    </para>

    <sect4 id="planner-stats-extended-functional-deps-limits">
     <title>Limitations of Functional Dependencies</title>

     <para>
      Functional dependencies are currently only applied when considering
      simple equality conditions that compare columns to constant values,
      and <literal>IN</literal> clauses with constant values.
      They are not used to improve estimates for equality conditions
      comparing two columns or comparing a column to an expression, nor for
      range clauses, <literal>LIKE</literal> or any other type of condition.
     </para>

     <para>
      When estimating with functional dependencies, the planner assumes that
      conditions on the involved columns are compatible and hence redundant.
      If they are incompatible, the correct estimate would be zero rows, but
      that possibility is not considered.  For example, given a query like
<programlisting>
SELECT * FROM zipcodes WHERE city = 'San

Title: Functional Dependencies in PostgreSQL
Summary
PostgreSQL supports functional dependency statistics to improve query estimates by accounting for dependencies between columns, which can be collected using the ANALYZE command and stored in statistics objects defined with the dependencies option, allowing the planner to adjust selectivity estimates and avoid underestimation.