Home Explore Blog CI



postgresql

1st chunk of `doc/src/sgml/tablesample-method.sgml`
e974a9c044fe9f5fffe68958c75ba4dc3e212f3c4e6db5570000000100000fae
<!-- doc/src/sgml/tablesample-method.sgml -->

<chapter id="tablesample-method">
 <title>Writing a Table Sampling Method</title>

 <indexterm zone="tablesample-method">
  <primary>table sampling method</primary>
 </indexterm>

 <indexterm zone="tablesample-method">
  <primary><literal>TABLESAMPLE</literal> method</primary>
 </indexterm>

 <para>
  <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal>
  clause supports custom table sampling methods, in addition to
  the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required
  by the SQL standard.  The sampling method determines which rows of the
  table will be selected when the <literal>TABLESAMPLE</literal> clause is used.
 </para>

 <para>
  At the SQL level, a table sampling method is represented by a single SQL
  function, typically implemented in C, having the signature
<programlisting>
method_name(internal) RETURNS tsm_handler
</programlisting>
  The name of the function is the same method name appearing in the
  <literal>TABLESAMPLE</literal> clause.  The <type>internal</type> argument is a dummy
  (always having value zero) that simply serves to prevent this function from
  being called directly from an SQL command.
  The result of the function must be a palloc'd struct of
  type <type>TsmRoutine</type>, which contains pointers to support functions for
  the sampling method.  These support functions are plain C functions and
  are not visible or callable at the SQL level.  The support functions are
  described in <xref linkend="tablesample-support-functions"/>.
 </para>

 <para>
  In addition to function pointers, the <type>TsmRoutine</type> struct must
  provide these additional fields:
 </para>

 <variablelist>
  <varlistentry>
   <term><literal>List *parameterTypes</literal></term>
   <listitem>
    <para>
     This is an OID list containing the data type OIDs of the parameter(s)
     that will be accepted by the <literal>TABLESAMPLE</literal> clause when this
     sampling method is used.  For example, for the built-in methods, this
     list contains a single item with value <literal>FLOAT4OID</literal>, which
     represents the sampling percentage.  Custom sampling methods can have
     more or different parameters.
    </para>
   </listitem>
  </varlistentry>

  <varlistentry>
   <term><literal>bool repeatable_across_queries</literal></term>
   <listitem>
    <para>
     If <literal>true</literal>, the sampling method can deliver identical samples
     across successive queries, if the same parameters
     and <literal>REPEATABLE</literal> seed value are supplied each time and the
     table contents have not changed.  When this is <literal>false</literal>,
     the <literal>REPEATABLE</literal> clause is not accepted for use with the
     sampling method.
    </para>
   </listitem>
  </varlistentry>

  <varlistentry>
   <term><literal>bool repeatable_across_scans</literal></term>
   <listitem>
    <para>
     If <literal>true</literal>, the sampling method can deliver identical samples
     across successive scans in the same query (assuming unchanging
     parameters, seed value, and snapshot).
     When this is <literal>false</literal>, the planner will not select plans that
     would require scanning the sampled table more than once, since that
     might result in inconsistent query output.
    </para>
   </listitem>
  </varlistentry>
 </variablelist>

 <para>
  The <type>TsmRoutine</type> struct type is declared
  in <filename>src/include/access/tsmapi.h</filename>, which see for additional
  details.
 </para>

 <para>
  The table sampling methods included in the standard distribution are good
  references when trying to write your own.  Look into
  the <filename>src/backend/access/tablesample</filename> subdirectory of the source
  tree for the built-in sampling methods, and into the <filename>contrib</filename>
  subdirectory for add-on methods.
 </para>

 <sect1 id="tablesample-support-functions">

Title: Writing a Table Sampling Method
Summary
This chapter describes how to implement custom table sampling methods in PostgreSQL. A table sampling method is represented by a SQL function that returns a `TsmRoutine` struct, which contains pointers to support functions for the sampling method. The `TsmRoutine` struct also includes information about the parameters accepted by the method and whether the method can deliver identical samples across successive queries and scans. The standard distribution includes several table sampling methods that can be used as references when writing your own.