Sampling Method Support Functions

<literal>true</literal>, the sampling method can deliver identical samples across successive scans in the same query (assuming unchanging parameters, seed value, and snapshot). When this is <literal>false</literal>, the planner will not select plans that would require scanning the sampled table more than once, since that might result in inconsistent query output. </para> </listitem> </varlistentry> </variablelist> <para> The <type>TsmRoutine</type> struct type is declared in <filename>src/include/access/tsmapi.h</filename>, which see for additional details. </para> <para> The table sampling methods included in the standard distribution are good references when trying to write your own. Look into the <filename>src/backend/access/tablesample</filename> subdirectory of the source tree for the built-in sampling methods, and into the <filename>contrib</filename> subdirectory for add-on methods. </para> <sect1 id="tablesample-support-functions"> <title>Sampling Method Support Functions</title> <para> The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct containing pointers to the support functions described below. Most of the functions are required, but some are optional, and those pointers can be NULL. </para> <para> <programlisting> void SampleScanGetSampleSize (PlannerInfo *root, RelOptInfo *baserel, List *paramexprs, BlockNumber *pages, double *tuples); </programlisting> This function is called during planning. It must estimate the number of relation pages that will be read during a sample scan, and the number of tuples that will be selected by the scan. (For example, these might be determined by estimating the sampling fraction, and then multiplying the <literal>baserel->pages</literal> and <literal>baserel->tuples</literal> numbers by that, being sure to round the results to integral values.) The <literal>paramexprs</literal> list holds the expression(s) that are parameters to the <literal>TABLESAMPLE</literal> clause. It is recommended to use <function>estimate_expression_value()</function> to try to reduce these expressions to constants, if their values are needed for estimation purposes; but the function must provide size estimates even if they cannot be reduced, and it should not fail even if the values appear invalid (remember that they're only estimates of what the run-time values will be). The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs. </para> <para> <programlisting> void InitSampleScan (SampleScanState *node, int eflags); </programlisting> Initialize for execution of a SampleScan plan node. This is called during executor startup. It should perform any initialization needed before processing can start. The <structname>SampleScanState</structname> node has already been created, but its <structfield>tsm_state</structfield> field is NULL. The <function>InitSampleScan</function> function can palloc whatever internal state data is needed by the sampling method, and store a pointer to it in <literal>node->tsm_state</literal>. Information about the table to scan is accessible through other fields of the <structname>SampleScanState</structname> node (but note that the <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set up yet). <literal>eflags</literal> contains flag bits describing the executor's operating mode for this plan node. </para> <para> When <literal>(eflags & EXEC_FLAG_EXPLAIN_ONLY)</literal> is true, the scan will not actually be performed, so this function should only do the minimum required to make the node state valid for <command>EXPLAIN</command> and <function>EndSampleScan</function>. </para> <para> This

This section details the support functions required for implementing a custom table sampling method in PostgreSQL. These functions, pointed to by the `TsmRoutine` struct, handle tasks such as estimating sample size during planning (`SampleScanGetSampleSize`) and initializing the sample scan during executor startup (`InitSampleScan`). The functions receive information about the table and execution environment, and allow the sampling method to allocate and initialize internal state data.