Home Explore Blog CI



postgresql

4th chunk of `doc/src/sgml/tablesample-method.sgml`
32a18766971d416fa0330e34868efee08650635de272508c000000010000083c
 method; it is either a hash derived from the
   <literal>REPEATABLE</literal> value if one was given, or the result
   of <literal>random()</literal> if not.
  </para>

  <para>
   This function may adjust the fields <literal>node-&gt;use_bulkread</literal>
   and <literal>node-&gt;use_pagemode</literal>.
   If <literal>node-&gt;use_bulkread</literal> is <literal>true</literal>, which it is by
   default, the scan will use a buffer access strategy that encourages
   recycling buffers after use.  It might be reasonable to set this
   to <literal>false</literal> if the scan will visit only a small fraction of the
   table's pages.
   If <literal>node-&gt;use_pagemode</literal> is <literal>true</literal>, which it is by
   default, the scan will perform visibility checking in a single pass for
   all tuples on each visited page.  It might be reasonable to set this
   to <literal>false</literal> if the scan will select only a small fraction of the
   tuples on each visited page.  That will result in fewer tuple visibility
   checks being performed, though each one will be more expensive because it
   will require more locking.
  </para>

  <para>
   If the sampling method is
   marked <literal>repeatable_across_scans</literal>, it must be able to
   select the same set of tuples during a rescan as it did originally, that is
   a fresh call of <function>BeginSampleScan</function> must lead to selecting the
   same tuples as before (if the <literal>TABLESAMPLE</literal> parameters
   and seed don't change).
  </para>

  <para>
<programlisting>
BlockNumber
NextSampleBlock (SampleScanState *node, BlockNumber nblocks);
</programlisting>

   Returns the block number of the next page to be scanned, or
   <literal>InvalidBlockNumber</literal> if no pages remain to be scanned.
  </para>

  <para>
   This function can be omitted (set the pointer to NULL), in which case
   the core code will perform a sequential scan of the entire relation.
   Such a scan can use synchronized scanning, so that the sampling method
   cannot assume that the relation pages are visited in the same order on

Title: Optimization and Block Selection in Sample Scans
Summary
This section discusses further optimization options within `BeginSampleScan`, specifically adjusting `use_bulkread` and `use_pagemode` based on the expected fraction of table pages and tuples to be scanned. It also covers the `repeatable_across_scans` property, ensuring consistent tuple selection across rescans. Finally, it introduces `NextSampleBlock`, a function that allows sampling methods to specify the next block to scan, offering control over the scanning order, or defaulting to a sequential scan if omitted.