CLUSTER command: Resorting, temporary files, and maintenance

and speed up the query. </para> <para> <command>CLUSTER</command> can re-sort the table using either an index scan on the specified index, or (if the index is a b-tree) a sequential scan followed by sorting. It will attempt to choose the method that will be faster, based on planner cost parameters and available statistical information. </para> <para> While <command>CLUSTER</command> is running, the <xref linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog, pg_temp</literal>. </para> <para> When an index scan is used, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes. </para> <para> When a sequential scan and sort is used, a temporary sort file is also created, so that the peak temporary space requirement is as much as double the table size, plus the index sizes. This method is often faster than the index scan method, but if the disk space requirement is intolerable, you can disable this choice by temporarily setting <xref linkend="guc-enable-sort"/> to <literal>off</literal>. </para> <para> It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a reasonably large value (but not more than the amount of RAM you can dedicate to the <command>CLUSTER</command> operation) before clustering. </para> <para> Because the planner records statistics about the ordering of tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the newly clustered table. Otherwise, the planner might make poor choices of query plans. </para> <para> Because <command>CLUSTER</command> remembers which indexes are clustered, one can cluster the tables one wants clustered manually the first time, then set up a periodic maintenance script that executes <command>CLUSTER</command> without any parameters, so

The CLUSTER command resorts tables using either index scans or sequential scans with sorting, choosing the faster method based on cost and statistics. During execution, the search path changes temporarily. Index scans require disk space equal to the table and index sizes for temporary copies. Sequential scans with sorting need temporary space for a sort file, potentially doubling the table size. Disabling sorting is possible to avoid high disk usage. Increasing maintenance_work_mem is recommended before clustering, and running ANALYZE afterward updates statistics for better query planning. CLUSTER remembers clustered indexes for automated maintenance.