the rows chosen may be correlated with other data such as IDs from
a sequence or the physical row ordering, which may skew performance
measurements.
</para>
<para>
To avoid this, you may wish to use the <function>permute</function>
function, or some other additional step with similar effect, to shuffle
the selected rows and remove such correlations.
</para>
</note>
<para>
Hash functions <literal>hash</literal>, <literal>hash_murmur2</literal> and
<literal>hash_fnv1a</literal> accept an input value and an optional seed parameter.
In case the seed isn't provided the value of <literal>:default_seed</literal>
is used, which is initialized randomly unless set by the command-line
<literal>-D</literal> option.
</para>
<para>
<literal>permute</literal> accepts an input value, a size, and an optional
seed parameter. It generates a pseudorandom permutation of integers in
the range <literal>[0, size)</literal>, and returns the index of the input
value in the permuted values. The permutation chosen is parameterized by
the seed, which defaults to <literal>:default_seed</literal>, if not
specified. Unlike the hash functions, <literal>permute</literal> ensures
that there are no collisions or holes in the output values. Input values
outside the interval are interpreted modulo the size. The function raises
an error if the size is not positive. <function>permute</function> can be
used to scatter the distribution of non-uniform random functions such as
<literal>random_zipfian</literal> or <literal>random_exponential</literal>
so that values drawn more often are not trivially correlated. For
instance, the following <application>pgbench</application> script
simulates a possible real world workload typical for social media and
blogging platforms where a few accounts generate excessive load:
<programlisting>
\set size 1000000
\set r random_zipfian(1, :size, 1.07)
\set k 1 + permute(:r, :size)
</programlisting>
In some cases several distinct distributions are needed which don't correlate
with each other and this is when the optional seed parameter comes in handy:
<programlisting>
\set k1 1 + permute(:r, :size, :default_seed + 123)
\set k2 1 + permute(:r, :size, :default_seed + 321)
</programlisting>
A similar behavior can also be approximated with <function>hash</function>:
<programlisting>
\set size 1000000
\set r random_zipfian(1, 100 * :size, 1.07)
\set k 1 + abs(hash(:r)) % :size
</programlisting>
However, since <function>hash</function> generates collisions, some values
will not be reachable and others will be more frequent than expected from
the original distribution.
</para>
<para>
As an example, the full definition of the built-in TPC-B-like
transaction is:
<programlisting>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
</programlisting>
This script allows each iteration of the transaction to reference
different, randomly-chosen rows. (This example also shows why it's
important for each client session to have its own variables —
otherwise they'd not be independently touching different rows.)
</para>
</refsect2>
<refsect2>
<title>Per-Transaction Logging</title>
<para>
With the <option>-l</option> option (but without
the <option>--aggregate-interval</option> option),
<application>pgbench</application> writes information about each transaction
to