Home Explore Blog CI



postgresql

4th chunk of `doc/src/sgml/ref/create_statistics.sgml`
084968fb7f0dda577984d25f0bb10e49ca4775c6a01cc40b0000000100000847
 correlated columns
   (containing identical data), and an MCV list on those columns:

<programlisting>
CREATE TABLE t2 (
    a   int,
    b   int
);

INSERT INTO t2 SELECT mod(i,100), mod(i,100)
                 FROM generate_series(1,1000000) s(i);

CREATE STATISTICS s2 (mcv) ON a, b FROM t2;

ANALYZE t2;

-- valid combination (found in MCV)
EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);

-- invalid combination (not found in MCV)
EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
</programlisting>

   The MCV list gives the planner more detailed information about the
   specific values that commonly appear in the table, as well as an upper
   bound on the selectivities of combinations of values that do not appear
   in the table, allowing it to generate better estimates in both cases.
  </para>

  <para>
   Create table <structname>t3</structname> with a single timestamp column,
   and run queries using expressions on that column.  Without extended
   statistics, the planner has no information about the data distribution for
   the expressions, and uses default estimates.  The planner also does not
   realize that the value of the date truncated to the month is fully
   determined by the value of the date truncated to the day. Then expression
   and ndistinct statistics are built on those two expressions:

<programlisting>
CREATE TABLE t3 (
    a   timestamp
);

INSERT INTO t3 SELECT i FROM generate_series('2020-01-01'::timestamp,
                                             '2020-12-31'::timestamp,
                                             '1 minute'::interval) s(i);

ANALYZE t3;

-- the number of matching rows will be drastically underestimated:
EXPLAIN ANALYZE SELECT * FROM t3
  WHERE date_trunc('month', a) = '2020-01-01'::timestamp;

EXPLAIN ANALYZE SELECT * FROM t3
  WHERE date_trunc('day', a) BETWEEN '2020-01-01'::timestamp
                                 AND '2020-06-30'::timestamp;

EXPLAIN ANALYZE SELECT date_trunc('month', a), date_trunc('day', a)
   FROM t3 GROUP BY 1, 2;

-- build ndistinct statistics on the pair of expressions (per-expression

Title: Examples of MCV lists on Correlated Columns and Expression Statistics on Timestamps
Summary
This section provides examples for creating statistics. The first example demonstrates creating an MCV (Most Common Values) list on two perfectly correlated columns in table t2, showing how it aids the planner in estimating query costs for valid and invalid combinations of values. The second example shows how to build statistics on timestamp expressions, demonstrating how the planner can be informed about data distribution and relationships between expressions, such as date truncation, leading to improved query plans.