Home Explore Blog CI



postgresql

8th chunk of `doc/src/sgml/pgstatstatements.sgml`
6c82d63aa89bd2cde663e8e882e67a19582db463bc843b170000000100000fa3
 appearing in the query.
  </para>

  <note>
   <para>
    The following details about constant replacement and
    <structfield>queryid</structfield> only apply when <xref
    linkend="guc-compute-query-id"/> is enabled.  If you use an external
    module instead to compute <structfield>queryid</structfield>, you
    should refer to its documentation for details.
   </para>
  </note>

  <para>
   When a constant's value has been ignored for purposes of matching the query
   to other queries, the constant is replaced by a parameter symbol, such
   as <literal>$1</literal>, in the <structname>pg_stat_statements</structname>
   display.
   The rest of the query text is that of the first query that had the
   particular <structfield>queryid</structfield> hash value associated with the
   <structname>pg_stat_statements</structname> entry.
  </para>

  <para>
   Queries on which normalization can be applied may be observed with constant
   values in <structname>pg_stat_statements</structname>, especially when there
   is a high rate of entry deallocations. To reduce the likelihood of this
   happening, consider increasing <varname>pg_stat_statements.max</varname>.
   The <structname>pg_stat_statements_info</structname> view, discussed below
   in <xref linkend="pgstatstatements-pg-stat-statements-info"/>,
   provides statistics about entry deallocations.
  </para>

  <para>
   In some cases, queries with visibly different texts might get merged into a
   single <structname>pg_stat_statements</structname> entry; as explained above,
   this is expected to happen for semantically equivalent queries.
   In addition, if the only difference between queries is the number of elements
   in a list of constants, the list will get squashed down to a single element but shown
   with a commented-out list indicator:

<screen>
=# SELECT pg_stat_statements_reset();
=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7);
=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7, 8);
=# SELECT query, calls FROM pg_stat_statements
   WHERE query LIKE 'SELECT%';
-[ RECORD 1 ]------------------------------
query | SELECT * FROM test WHERE a IN ($1 /*, ... */)
calls | 2
</screen>

   In addition to these cases, there is a small chance of hash collisions
   causing unrelated queries to be merged into one entry.
   (This cannot happen for queries belonging to different users or databases,
   however.)
  </para>

  <para>
   Since the <structfield>queryid</structfield> hash value is computed on the
   post-parse-analysis representation of the queries, the opposite is
   also possible: queries with identical texts might appear as
   separate entries, if they have different meanings as a result of
   factors such as different <varname>search_path</varname> settings.
  </para>

  <para>
   Consumers of <structname>pg_stat_statements</structname> may wish to use
   <structfield>queryid</structfield> (perhaps in combination with
   <structfield>dbid</structfield> and <structfield>userid</structfield>) as a more stable
   and reliable identifier for each entry than its query text.
   However, it is important to understand that there are only limited
   guarantees around the stability of the <structfield>queryid</structfield> hash
   value.  Since the identifier is derived from the
   post-parse-analysis tree, its value is a function of, among other
   things, the internal object identifiers appearing in this representation.
   This has some counterintuitive implications.  For example,
   <filename>pg_stat_statements</filename> will consider two apparently-identical
   queries to be distinct, if they reference for example a function that was
   dropped and recreated between the executions of the two queries.
   Conversely, if a table is dropped and recreated between the
   executions of queries, two apparently-identical queries may be
   considered the same. However, if the alias for a table is different
   for otherwise-similar queries, these queries will be considered

Title: Query Normalization and Hashing in pg_stat_statements
Summary
The pg_stat_statements view normalizes and hashes queries to group semantically equivalent queries together, replacing constants with parameter symbols and considering factors like search path settings, but this process is not foolproof and can lead to unexpected results due to hash collisions, object identifier changes, and other factors, making queryid a more stable identifier than query text.