pgbench Random Number Generation Functions: Zipfian, Square Root, and Distribution Details

<para> Computes a Zipfian-distributed random integer in <literal>[lb, ub]</literal>, see below. </para> <para> <literal>random_zipfian(1, 10, 1.5)</literal> <returnvalue>an integer between 1 and 10</returnvalue> </para></entry> </row> <row> <entry role="func_table_entry"><para role="func_signature"> <function>sqrt</function> ( <replaceable>number</replaceable> ) <returnvalue>double</returnvalue> </para> <para> Square root </para> <para> <literal>sqrt(2.0)</literal> <returnvalue>1.414213562</returnvalue> </para></entry> </row> </tbody> </tgroup> </table> <para> The <literal>random</literal> function generates values using a uniform distribution, that is all the values are drawn within the specified range with equal probability. The <literal>random_exponential</literal>, <literal>random_gaussian</literal> and <literal>random_zipfian</literal> functions require an additional double parameter which determines the precise shape of the distribution. </para> <itemizedlist> <listitem> <para> For an exponential distribution, <replaceable>parameter</replaceable> controls the distribution by truncating a quickly-decreasing exponential distribution at <replaceable>parameter</replaceable>, and then projecting onto integers between the bounds. To be precise, with <literallayout> f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1 - exp(-parameter)) </literallayout> Then value <replaceable>i</replaceable> between <replaceable>min</replaceable> and <replaceable>max</replaceable> inclusive is drawn with probability: <literal>f(i) - f(i + 1)</literal>. </para> <para> Intuitively, the larger the <replaceable>parameter</replaceable>, the more frequently values close to <replaceable>min</replaceable> are accessed, and the less frequently values close to <replaceable>max</replaceable> are accessed. The closer to 0 <replaceable>parameter</replaceable> is, the flatter (more uniform) the access distribution. A crude approximation of the distribution is that the most frequent 1% values in the range, close to <replaceable>min</replaceable>, are drawn <replaceable>parameter</replaceable>% of the time. The <replaceable>parameter</replaceable> value must be strictly positive. </para> </listitem> <listitem> <para> For a Gaussian distribution, the interval is mapped onto a standard normal distribution (the classical bell-shaped Gaussian curve) truncated at <literal>-parameter</literal> on the left and <literal>+parameter</literal> on the right. Values in the middle of the interval are more likely to be drawn. To be precise, if <literal>PHI(x)</literal> is the cumulative distribution function of the standard normal distribution, with mean <literal>mu</literal> defined as <literal>(max + min) / 2.0</literal>, with <literallayout> f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / (2.0 * PHI(parameter) - 1) </literallayout> then value <replaceable>i</replaceable> between <replaceable>min</replaceable> and <replaceable>max</replaceable> inclusive is drawn with probability: <literal>f(i + 0.5) - f(i - 0.5)</literal>. Intuitively, the larger the <replaceable>parameter</replaceable>, the more frequently values close to the middle of the interval are drawn, and the less frequently values close to the <replaceable>min</replaceable> and <replaceable>max</replaceable> bounds. About 67% of values are drawn from the middle <literal>1.0 / parameter</literal>, that is a relative <literal>0.5 / parameter</literal> around the mean, and 95% in the middle <literal>2.0 / parameter</literal>, that is a relative

This section details the `random_zipfian()` and `sqrt()` functions in pgbench, along with a deeper explanation of how the `random_exponential()`, `random_gaussian()` and `random_zipfian()` functions work. It describes uniform distribution and explains how the additional parameter in exponential, Gaussian, and Zipfian distributions affects the shape of the distribution, detailing the mathematical formulas and providing intuition on how to adjust the parameter for desired outcomes.