Home Explore Blog CI



postgresql

5th chunk of `doc/src/sgml/tablesample-method.sgml`
a3a73dd85860aff304d77408cfda97ac176e9a24a826c5d40000000100000c22
 it
   will require more locking.
  </para>

  <para>
   If the sampling method is
   marked <literal>repeatable_across_scans</literal>, it must be able to
   select the same set of tuples during a rescan as it did originally, that is
   a fresh call of <function>BeginSampleScan</function> must lead to selecting the
   same tuples as before (if the <literal>TABLESAMPLE</literal> parameters
   and seed don't change).
  </para>

  <para>
<programlisting>
BlockNumber
NextSampleBlock (SampleScanState *node, BlockNumber nblocks);
</programlisting>

   Returns the block number of the next page to be scanned, or
   <literal>InvalidBlockNumber</literal> if no pages remain to be scanned.
  </para>

  <para>
   This function can be omitted (set the pointer to NULL), in which case
   the core code will perform a sequential scan of the entire relation.
   Such a scan can use synchronized scanning, so that the sampling method
   cannot assume that the relation pages are visited in the same order on
   each scan.
  </para>

  <para>
<programlisting>
OffsetNumber
NextSampleTuple (SampleScanState *node,
                 BlockNumber blockno,
                 OffsetNumber maxoffset);
</programlisting>

   Returns the offset number of the next tuple to be sampled on the
   specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to
   be sampled.  <literal>maxoffset</literal> is the largest offset number in use
   on the page.
  </para>

  <note>
   <para>
    <function>NextSampleTuple</function> is not explicitly told which of the offset
    numbers in the range <literal>1 .. maxoffset</literal> actually contain valid
    tuples.  This is not normally a problem since the core code ignores
    requests to sample missing or invisible tuples; that should not result in
    any bias in the sample.  However, if necessary, the function can use
    <literal>node-&gt;donetuples</literal> to examine how many of the tuples
    it returned were valid and visible.
   </para>
  </note>

  <note>
   <para>
    <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume
    that <literal>blockno</literal> is the same page number returned by the most
    recent <function>NextSampleBlock</function> call.  It was returned by some
    previous <function>NextSampleBlock</function> call, but the core code is allowed
    to call <function>NextSampleBlock</function> in advance of actually scanning
    pages, so as to support prefetching.  It is OK to assume that once
    sampling of a given page begins, successive <function>NextSampleTuple</function>
    calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is
    returned.
   </para>
  </note>

  <para>
<programlisting>
void
EndSampleScan (SampleScanState *node);
</programlisting>

   End the scan and release resources.  It is normally not important
   to release palloc'd memory, but any externally-visible resources
   should be cleaned up.
   This function can be omitted (set the pointer to NULL) in the common
   case where no such resources exist.
  </para>

 </sect1>

</chapter>

Title: More sampling functions: NextSampleTuple and EndSampleScan
Summary
This section details the `NextSampleTuple` and `EndSampleScan` functions. `NextSampleTuple` determines the next tuple to sample on a given page, returning its offset number. It's also informed that it shouldn't assume the given block number is the same as that returned by the most recent `NextSampleBlock` call, to allow for prefetching. Finally, `EndSampleScan` cleans up resources allocated by the scan. It can be omitted if no resources are allocated.