Managing Conflicts Between Standby Queries and WAL Replay

nature of the WAL records being replayed, a query that was canceled may well succeed if it is executed again. </para> <para> Keep in mind that the delay parameters are compared to the elapsed time since the WAL data was received by the standby server. Thus, the grace period allowed to any one query on the standby is never more than the delay parameter, and could be considerably less if the standby has already fallen behind as a result of waiting for previous queries to complete, or as a result of being unable to keep up with a heavy update load. </para> <para> The most common reason for conflict between standby queries and WAL replay is <quote>early cleanup</quote>. Normally, <productname>PostgreSQL</productname> allows cleanup of old row versions when there are no transactions that need to see them to ensure correct visibility of data according to MVCC rules. However, this rule can only be applied for transactions executing on the primary. So it is possible that cleanup on the primary will remove row versions that are still visible to a transaction on the standby. </para> <para> Row version cleanup isn't the only potential cause of conflicts with standby queries. All index-only scans (including those that run on standbys) must use an <acronym>MVCC</acronym> snapshot that <quote>agrees</quote> with the visibility map. Conflicts are therefore required whenever <command>VACUUM</command> <link linkend="vacuum-for-visibility-map">sets a page as all-visible in the visibility map</link> containing one or more rows <emphasis>not</emphasis> visible to all standby queries. So even running <command>VACUUM</command> against a table with no updated or deleted rows requiring cleanup might lead to conflicts. </para> <para> Users should be clear that tables that are regularly and heavily updated on the primary server will quickly cause cancellation of longer running queries on the standby. In such cases the setting of a finite value for <varname>max_standby_archive_delay</varname> or <varname>max_standby_streaming_delay</varname> can be considered similar to setting <varname>statement_timeout</varname>. </para> <para> Remedial possibilities exist if the number of standby-query cancellations is found to be unacceptable. The first option is to set the parameter <varname>hot_standby_feedback</varname>, which prevents <command>VACUUM</command> from removing recently-dead rows and so cleanup conflicts do not occur. If you do this, you should note that this will delay cleanup of dead rows on the primary, which may result in undesirable table bloat. However, the cleanup situation will be no worse than if the standby queries were running directly on the primary server, and you are still getting the benefit of off-loading execution onto the standby. If standby servers connect and disconnect frequently, you might want to make adjustments to handle the period when <varname>hot_standby_feedback</varname> feedback is not being provided. For example, consider increasing <varname>max_standby_archive_delay</varname> so that queries are not rapidly canceled by conflicts in WAL archive files during disconnected periods. You should also consider increasing <varname>max_standby_streaming_delay</varname> to avoid rapid cancellations by newly-arrived streaming WAL entries after reconnection. </para> <para> The number of query cancels and the reason for them can be viewed using the <structname>pg_stat_database_conflicts</structname> system view on the standby server. The <structname>pg_stat_database</structname> system view also contains summary information. </para> <para> Users can control whether a log message is produced when WAL replay is waiting longer than <varname>deadlock_timeout</varname>

This section discusses the causes of conflicts between standby queries and WAL replay, including 'early cleanup' and index-only scans. It explains how delay parameters and query cancellation work, and provides guidance on setting these parameters and using features like hot_standby_feedback to prevent conflicts. The section also describes how to monitor and adjust for conflicts, including using system views like pg_stat_database_conflicts and pg_stat_database to track query cancels and other information.