Home Explore Blog CI



postgresql

3rd chunk of `doc/src/sgml/tablesample-method.sgml`
c7220d810a93824e550377c6b0f25396d8870ed12653e4720000000100000c94
 <structname>SampleScanState</structname> node has already been created, but
   its <structfield>tsm_state</structfield> field is NULL.
   The <function>InitSampleScan</function> function can palloc whatever internal
   state data is needed by the sampling method, and store a pointer to
   it in <literal>node-&gt;tsm_state</literal>.
   Information about the table to scan is accessible through other fields
   of the <structname>SampleScanState</structname> node (but note that the
   <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
   up yet).
   <literal>eflags</literal> contains flag bits describing the executor's
   operating mode for this plan node.
  </para>

  <para>
   When <literal>(eflags &amp; EXEC_FLAG_EXPLAIN_ONLY)</literal> is true,
   the scan will not actually be performed, so this function should only do
   the minimum required to make the node state valid for <command>EXPLAIN</command>
   and <function>EndSampleScan</function>.
  </para>

  <para>
   This function can be omitted (set the pointer to NULL), in which case
   <function>BeginSampleScan</function> must perform all initialization needed
   by the sampling method.
  </para>

  <para>
<programlisting>
void
BeginSampleScan (SampleScanState *node,
                 Datum *params,
                 int nparams,
                 uint32 seed);
</programlisting>

   Begin execution of a sampling scan.
   This is called just before the first attempt to fetch a tuple, and
   may be called again if the scan needs to be restarted.
   Information about the table to scan is accessible through fields
   of the <structname>SampleScanState</structname> node (but note that the
   <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
   up yet).
   The <literal>params</literal> array, of length <literal>nparams</literal>, contains the
   values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause.
   These will have the number and types specified in the sampling
   method's <literal>parameterTypes</literal> list, and have been checked
   to not be null.
   <literal>seed</literal> contains a seed to use for any random numbers generated
   within the sampling method; it is either a hash derived from the
   <literal>REPEATABLE</literal> value if one was given, or the result
   of <literal>random()</literal> if not.
  </para>

  <para>
   This function may adjust the fields <literal>node-&gt;use_bulkread</literal>
   and <literal>node-&gt;use_pagemode</literal>.
   If <literal>node-&gt;use_bulkread</literal> is <literal>true</literal>, which it is by
   default, the scan will use a buffer access strategy that encourages
   recycling buffers after use.  It might be reasonable to set this
   to <literal>false</literal> if the scan will visit only a small fraction of the
   table's pages.
   If <literal>node-&gt;use_pagemode</literal> is <literal>true</literal>, which it is by
   default, the scan will perform visibility checking in a single pass for
   all tuples on each visited page.  It might be reasonable to set this
   to <literal>false</literal> if the scan will select only a small fraction of the
   tuples on each visited page.  That will

Title: Initialization and Startup of Sample Scans
Summary
This section focuses on the `InitSampleScan` and `BeginSampleScan` functions, which are crucial for setting up and starting a sampling scan. `InitSampleScan` initializes the scan, allocating internal state if necessary, while `BeginSampleScan` is called before the first tuple fetch, providing access to parameters and a random seed. `BeginSampleScan` also allows adjustments to buffer access and visibility checking strategies for optimization.