Optimizing WAL Performance with Commit Delay

will help smooth response times during the period immediately following each checkpoint. </para> <para> The <xref linkend="guc-commit-delay"/> parameter defines for how many microseconds a group commit leader process will sleep after acquiring a lock within <function>XLogFlush</function>, while group commit followers queue up behind the leader. This delay allows other server processes to add their commit records to the WAL buffers so that all of them will be flushed by the leader's eventual sync operation. No sleep will occur if <xref linkend="guc-fsync"/> is not enabled, or if fewer than <xref linkend="guc-commit-siblings"/> other sessions are currently in active transactions; this avoids sleeping when it's unlikely that any other session will commit soon. Note that on some platforms, the resolution of a sleep request is ten milliseconds, so that any nonzero <varname>commit_delay</varname> setting between 1 and 10000 microseconds would have the same effect. Note also that on some platforms, sleep operations may take slightly longer than requested by the parameter. </para> <para> Since the purpose of <varname>commit_delay</varname> is to allow the cost of each flush operation to be amortized across concurrently committing transactions (potentially at the expense of transaction latency), it is necessary to quantify that cost before the setting can be chosen intelligently. The higher that cost is, the more effective <varname>commit_delay</varname> is expected to be in increasing transaction throughput, up to a point. The <xref linkend="pgtestfsync"/> program can be used to measure the average time in microseconds that a single WAL flush operation takes. A value of half of the average time the program reports it takes to flush after a single 8kB write operation is often the most effective setting for <varname>commit_delay</varname>, so this value is recommended as the starting point to use when optimizing for a particular workload. While tuning <varname>commit_delay</varname> is particularly useful when the WAL is stored on high-latency rotating disks, benefits can be significant even on storage media with very fast sync times, such as solid-state drives or RAID arrays with a battery-backed write cache; but this should definitely be tested against a representative workload. Higher values of <varname>commit_siblings</varname> should be used in such cases, whereas smaller <varname>commit_siblings</varname> values are often helpful on higher latency media. Note that it is quite possible that a setting of <varname>commit_delay</varname> that is too high can increase transaction latency by so much that total transaction throughput suffers. </para> <para> When <varname>commit_delay</varname> is set to zero (the default), it is still possible for a form of group commit to occur, but each group will consist only of sessions that reach the point where they need to flush their commit records during the window in which the previous flush operation (if any) is occurring. At higher client counts a <quote>gangway effect</quote> tends to occur, so that the effects of group commit become significant even when <varname>commit_delay</varname> is zero, and thus explicitly setting <varname>commit_delay</varname> tends to help less. Setting <varname>commit_delay</varname> can only help when (1) there are some concurrently committing transactions, and (2) throughput is limited to some degree by commit rate; but with high rotational latency this setting can be effective in increasing transaction throughput with as few as two clients (that is, a single committing client with one sibling transaction). </para> <para> The <xref linkend="guc-wal-sync-method"/> parameter determines how <productname>PostgreSQL</productname> will ask the kernel to force <acronym>WAL</acronym>

The commit_delay parameter in PostgreSQL allows for group commit, where multiple transactions are flushed together, reducing the cost of each flush operation and increasing transaction throughput, and its optimal setting depends on the cost of flush operations, which can be measured using the pg_test_fsync program, and can significantly improve performance, especially on high-latency storage media.