Home Explore Blog CI



postgresql

2nd chunk of `doc/src/sgml/tablesample-method.sgml`
1fbf81325162e69f78f06f3f9817602658e46f28af8544640000000100000fa0
 <literal>true</literal>, the sampling method can deliver identical samples
     across successive scans in the same query (assuming unchanging
     parameters, seed value, and snapshot).
     When this is <literal>false</literal>, the planner will not select plans that
     would require scanning the sampled table more than once, since that
     might result in inconsistent query output.
    </para>
   </listitem>
  </varlistentry>
 </variablelist>

 <para>
  The <type>TsmRoutine</type> struct type is declared
  in <filename>src/include/access/tsmapi.h</filename>, which see for additional
  details.
 </para>

 <para>
  The table sampling methods included in the standard distribution are good
  references when trying to write your own.  Look into
  the <filename>src/backend/access/tablesample</filename> subdirectory of the source
  tree for the built-in sampling methods, and into the <filename>contrib</filename>
  subdirectory for add-on methods.
 </para>

 <sect1 id="tablesample-support-functions">
  <title>Sampling Method Support Functions</title>

  <para>
   The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct
   containing pointers to the support functions described below.  Most of
   the functions are required, but some are optional, and those pointers can
   be NULL.
  </para>

  <para>
<programlisting>
void
SampleScanGetSampleSize (PlannerInfo *root,
                         RelOptInfo *baserel,
                         List *paramexprs,
                         BlockNumber *pages,
                         double *tuples);
</programlisting>

   This function is called during planning.  It must estimate the number of
   relation pages that will be read during a sample scan, and the number of
   tuples that will be selected by the scan.  (For example, these might be
   determined by estimating the sampling fraction, and then multiplying
   the <literal>baserel-&gt;pages</literal> and <literal>baserel-&gt;tuples</literal>
   numbers by that, being sure to round the results to integral values.)
   The <literal>paramexprs</literal> list holds the expression(s) that are
   parameters to the <literal>TABLESAMPLE</literal> clause.  It is recommended to
   use <function>estimate_expression_value()</function> to try to reduce these
   expressions to constants, if their values are needed for estimation
   purposes; but the function must provide size estimates even if they cannot
   be reduced, and it should not fail even if the values appear invalid
   (remember that they're only estimates of what the run-time values will be).
   The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs.
  </para>

  <para>
<programlisting>
void
InitSampleScan (SampleScanState *node,
                int eflags);
</programlisting>

   Initialize for execution of a SampleScan plan node.
   This is called during executor startup.
   It should perform any initialization needed before processing can start.
   The <structname>SampleScanState</structname> node has already been created, but
   its <structfield>tsm_state</structfield> field is NULL.
   The <function>InitSampleScan</function> function can palloc whatever internal
   state data is needed by the sampling method, and store a pointer to
   it in <literal>node-&gt;tsm_state</literal>.
   Information about the table to scan is accessible through other fields
   of the <structname>SampleScanState</structname> node (but note that the
   <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
   up yet).
   <literal>eflags</literal> contains flag bits describing the executor's
   operating mode for this plan node.
  </para>

  <para>
   When <literal>(eflags &amp; EXEC_FLAG_EXPLAIN_ONLY)</literal> is true,
   the scan will not actually be performed, so this function should only do
   the minimum required to make the node state valid for <command>EXPLAIN</command>
   and <function>EndSampleScan</function>.
  </para>

  <para>
   This

Title: Sampling Method Support Functions
Summary
This section details the support functions required for implementing a custom table sampling method in PostgreSQL. These functions, pointed to by the `TsmRoutine` struct, handle tasks such as estimating sample size during planning (`SampleScanGetSampleSize`) and initializing the sample scan during executor startup (`InitSampleScan`). The functions receive information about the table and execution environment, and allow the sampling method to allocate and initialize internal state data.