Examples of MCV lists on Correlated Columns and Expression Statistics on Timestamps

correlated columns (containing identical data), and an MCV list on those columns: <programlisting> CREATE TABLE t2 ( a int, b int ); INSERT INTO t2 SELECT mod(i,100), mod(i,100) FROM generate_series(1,1000000) s(i); CREATE STATISTICS s2 (mcv) ON a, b FROM t2; ANALYZE t2; -- valid combination (found in MCV) EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1); -- invalid combination (not found in MCV) EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2); </programlisting> The MCV list gives the planner more detailed information about the specific values that commonly appear in the table, as well as an upper bound on the selectivities of combinations of values that do not appear in the table, allowing it to generate better estimates in both cases. </para> <para> Create table <structname>t3</structname> with a single timestamp column, and run queries using expressions on that column. Without extended statistics, the planner has no information about the data distribution for the expressions, and uses default estimates. The planner also does not realize that the value of the date truncated to the month is fully determined by the value of the date truncated to the day. Then expression and ndistinct statistics are built on those two expressions: <programlisting> CREATE TABLE t3 ( a timestamp ); INSERT INTO t3 SELECT i FROM generate_series('2020-01-01'::timestamp, '2020-12-31'::timestamp, '1 minute'::interval) s(i); ANALYZE t3; -- the number of matching rows will be drastically underestimated: EXPLAIN ANALYZE SELECT * FROM t3 WHERE date_trunc('month', a) = '2020-01-01'::timestamp; EXPLAIN ANALYZE SELECT * FROM t3 WHERE date_trunc('day', a) BETWEEN '2020-01-01'::timestamp AND '2020-06-30'::timestamp; EXPLAIN ANALYZE SELECT date_trunc('month', a), date_trunc('day', a) FROM t3 GROUP BY 1, 2; -- build ndistinct statistics on the pair of expressions (per-expression

This section provides examples for creating statistics. The first example demonstrates creating an MCV (Most Common Values) list on two perfectly correlated columns in table t2, showing how it aids the planner in estimating query costs for valid and invalid combinations of values. The second example shows how to build statistics on timestamp expressions, demonstrating how the planner can be informed about data distribution and relationships between expressions, such as date truncation, leading to improved query plans.