Home Explore Blog CI



postgresql

29th chunk of `doc/src/sgml/perform.sgml`
fbf5de779d494113d0d60e5da6941e215babdd89de611d520000000100000fa4
   when the planner will <quote>flatten out</quote> subqueries, and the
   other controls when it will flatten out explicit joins.  Typically
   you would either set <varname>join_collapse_limit</varname> equal to
   <varname>from_collapse_limit</varname> (so that explicit joins and subqueries
   act similarly) or set <varname>join_collapse_limit</varname> to 1 (if you want
   to control join order with explicit joins).  But you might set them
   differently if you are trying to fine-tune the trade-off between planning
   time and run time.
  </para>
 </sect1>

 <sect1 id="populate">
  <title>Populating a Database</title>

  <para>
   One might need to insert a large amount of data when first populating
   a database. This section contains some suggestions on how to make
   this process as efficient as possible.
  </para>

  <sect2 id="disable-autocommit">
   <title>Disable Autocommit</title>

   <indexterm>
    <primary>autocommit</primary>
    <secondary>bulk-loading data</secondary>
   </indexterm>

   <para>
    When using multiple <command>INSERT</command>s, turn off autocommit and just do
    one commit at the end.  (In plain
    SQL, this means issuing <command>BEGIN</command> at the start and
    <command>COMMIT</command> at the end.  Some client libraries might
    do this behind your back, in which case you need to make sure the
    library does it when you want it done.)  If you allow each
    insertion to be committed separately,
    <productname>PostgreSQL</productname> is doing a lot of work for
    each row that is added.  An additional benefit of doing all
    insertions in one transaction is that if the insertion of one row
    were to fail then the insertion of all rows inserted up to that
    point would be rolled back, so you won't be stuck with partially
    loaded data.
   </para>
  </sect2>

  <sect2 id="populate-copy-from">
   <title>Use <command>COPY</command></title>

   <para>
    Use <link linkend="sql-copy"><command>COPY</command></link> to load
    all the rows in one command, instead of using a series of
    <command>INSERT</command> commands.  The <command>COPY</command>
    command is optimized for loading large numbers of rows; it is less
    flexible than <command>INSERT</command>, but incurs significantly
    less overhead for large data loads. Since <command>COPY</command>
    is a single command, there is no need to disable autocommit if you
    use this method to populate a table.
   </para>

   <para>
    If you cannot use <command>COPY</command>, it might help to use <link
    linkend="sql-prepare"><command>PREPARE</command></link> to create a
    prepared <command>INSERT</command> statement, and then use
    <command>EXECUTE</command> as many times as required.  This avoids
    some of the overhead of repeatedly parsing and planning
    <command>INSERT</command>. Different interfaces provide this facility
    in different ways; look for <quote>prepared statements</quote> in the interface
    documentation.
   </para>

   <para>
    Note that loading a large number of rows using
    <command>COPY</command> is almost always faster than using
    <command>INSERT</command>, even if <command>PREPARE</command> is used and
    multiple insertions are batched into a single transaction.
   </para>

   <para>
    <command>COPY</command> is fastest when used within the same
    transaction as an earlier <command>CREATE TABLE</command> or
    <command>TRUNCATE</command> command. In such cases no WAL
    needs to be written, because in case of an error, the files
    containing the newly loaded data will be removed anyway.
    However, this consideration only applies when
    <xref linkend="guc-wal-level"/> is <literal>minimal</literal>
    as all commands must write WAL otherwise.
   </para>

  </sect2>

  <sect2 id="populate-rm-indexes">
   <title>Remove Indexes</title>

   <para>
    If you are loading a freshly created table, the fastest method is to
    create the table, bulk load the table's data

Title: Optimizing Database Population
Summary
To efficiently populate a database, consider disabling autocommit for multiple insertions, using the COPY command for bulk loading, preparing INSERT statements, and removing indexes before loading data, as these techniques can significantly reduce overhead and improve performance when loading large amounts of data.