More sampling functions: NextSampleTuple and EndSampleScan

it will require more locking. </para> <para> If the sampling method is marked <literal>repeatable_across_scans</literal>, it must be able to select the same set of tuples during a rescan as it did originally, that is a fresh call of <function>BeginSampleScan</function> must lead to selecting the same tuples as before (if the <literal>TABLESAMPLE</literal> parameters and seed don't change). </para> <para> <programlisting> BlockNumber NextSampleBlock (SampleScanState *node, BlockNumber nblocks); </programlisting> Returns the block number of the next page to be scanned, or <literal>InvalidBlockNumber</literal> if no pages remain to be scanned. </para> <para> This function can be omitted (set the pointer to NULL), in which case the core code will perform a sequential scan of the entire relation. Such a scan can use synchronized scanning, so that the sampling method cannot assume that the relation pages are visited in the same order on each scan. </para> <para> <programlisting> OffsetNumber NextSampleTuple (SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); </programlisting> Returns the offset number of the next tuple to be sampled on the specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to be sampled. <literal>maxoffset</literal> is the largest offset number in use on the page. </para> <note> <para> <function>NextSampleTuple</function> is not explicitly told which of the offset numbers in the range <literal>1 .. maxoffset</literal> actually contain valid tuples. This is not normally a problem since the core code ignores requests to sample missing or invisible tuples; that should not result in any bias in the sample. However, if necessary, the function can use <literal>node->donetuples</literal> to examine how many of the tuples it returned were valid and visible. </para> </note> <note> <para> <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume that <literal>blockno</literal> is the same page number returned by the most recent <function>NextSampleBlock</function> call. It was returned by some previous <function>NextSampleBlock</function> call, but the core code is allowed to call <function>NextSampleBlock</function> in advance of actually scanning pages, so as to support prefetching. It is OK to assume that once sampling of a given page begins, successive <function>NextSampleTuple</function> calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is returned. </para> </note> <para> <programlisting> void EndSampleScan (SampleScanState *node); </programlisting> End the scan and release resources. It is normally not important to release palloc'd memory, but any externally-visible resources should be cleaned up. This function can be omitted (set the pointer to NULL) in the common case where no such resources exist. </para> </sect1> </chapter>

This section details the `NextSampleTuple` and `EndSampleScan` functions. `NextSampleTuple` determines the next tuple to sample on a given page, returning its offset number. It's also informed that it shouldn't assume the given block number is the same as that returned by the most recent `NextSampleBlock` call, to allow for prefetching. Finally, `EndSampleScan` cleans up resources allocated by the scan. It can be omitted if no resources are allocated.