Home Explore Blog CI



postgresql

31th chunk of `doc/src/sgml/perform.sgml`
56fc84d2b5675c2558aa6d40816d34ef7cf6b14b9f9278090000000100000fa0
 <varname>maintenance_work_mem</varname></title>

   <para>
    Temporarily increasing the <xref linkend="guc-maintenance-work-mem"/>
    configuration variable when loading large amounts of data can
    lead to improved performance.  This will help to speed up <command>CREATE
    INDEX</command> commands and <command>ALTER TABLE ADD FOREIGN KEY</command> commands.
    It won't do much for <command>COPY</command> itself, so this advice is
    only useful when you are using one or both of the above techniques.
   </para>
  </sect2>

  <sect2 id="populate-max-wal-size">
   <title>Increase <varname>max_wal_size</varname></title>

   <para>
    Temporarily increasing the <xref linkend="guc-max-wal-size"/>
    configuration variable can also
    make large data loads faster.  This is because loading a large
    amount of data into <productname>PostgreSQL</productname> will
    cause checkpoints to occur more often than the normal checkpoint
    frequency (specified by the <varname>checkpoint_timeout</varname>
    configuration variable). Whenever a checkpoint occurs, all dirty
    pages must be flushed to disk. By increasing
    <varname>max_wal_size</varname> temporarily during bulk
    data loads, the number of checkpoints that are required can be
    reduced.
   </para>
  </sect2>

  <sect2 id="populate-pitr">
   <title>Disable WAL Archival and Streaming Replication</title>

   <para>
    When loading large amounts of data into an installation that uses
    WAL archiving or streaming replication, it might be faster to take a
    new base backup after the load has completed than to process a large
    amount of incremental WAL data.  To prevent incremental WAL logging
    while loading, disable archiving and streaming replication, by setting
    <xref linkend="guc-wal-level"/> to <literal>minimal</literal>,
    <xref linkend="guc-archive-mode"/> to <literal>off</literal>, and
    <xref linkend="guc-max-wal-senders"/> to zero.
    But note that changing these settings requires a server restart,
    and makes any base backups taken before unavailable for archive
    recovery and standby server, which may lead to data loss.
   </para>

   <para>
    Aside from avoiding the time for the archiver or WAL sender to process the
    WAL data, doing this will actually make certain commands faster, because
    they do not to write WAL at all if <varname>wal_level</varname>
    is <literal>minimal</literal> and the current subtransaction (or top-level
    transaction) created or truncated the table or index they change.  (They
    can guarantee crash safety more cheaply by doing
    an <function>fsync</function> at the end than by writing WAL.)
   </para>
  </sect2>

  <sect2 id="populate-analyze">
   <title>Run <command>ANALYZE</command> Afterwards</title>

   <para>
    Whenever you have significantly altered the distribution of data
    within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
    includes bulk loading large amounts of data into the table.  Running
    <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
    ensures that the planner has up-to-date statistics about the
    table.  With no statistics or obsolete statistics, the planner might
    make poor decisions during query planning, leading to poor
    performance on any tables with inaccurate or nonexistent
    statistics.  Note that if the autovacuum daemon is enabled, it might
    run <command>ANALYZE</command> automatically; see
    <xref linkend="vacuum-for-statistics"/>
    and <xref linkend="autovacuum"/> for more information.
   </para>
  </sect2>

  <sect2 id="populate-pg-dump">
   <title>Some Notes about <application>pg_dump</application></title>

   <para>
    Dump scripts generated by <application>pg_dump</application> automatically apply
    several, but not all, of the above guidelines.  To restore a
    <application>pg_dump</application> dump as quickly as possible, you need to
    do

Title: Best Practices for Optimizing Bulk Data Loading in PostgreSQL
Summary
To optimize bulk data loading in PostgreSQL, consider temporarily increasing maintenance_work_mem and max_wal_size, disabling WAL archiving and streaming replication, and running ANALYZE afterwards to update table statistics, as these techniques can significantly improve performance and reduce potential issues with data integrity and database performance.