Home Explore Blog CI



postgresql

1st chunk of `doc/src/sgml/bloom.sgml`
c961d78f5bab9a599d20809224f537d8dc7701562fbeb4180000000100000fa0
<!-- doc/src/sgml/bloom.sgml -->

<sect1 id="bloom" xreflabel="bloom">
 <title>bloom &mdash; bloom filter index access method</title>

 <indexterm zone="bloom">
  <primary>bloom</primary>
 </indexterm>

 <para>
  <literal>bloom</literal> provides an index access method based on
  <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</ulink>.
 </para>

 <para>
  A Bloom filter is a space-efficient data structure that is used to test
  whether an element is a member of a set.  In the case of an index access
  method, it allows fast exclusion of non-matching tuples via signatures
  whose size is determined at index creation.
 </para>

 <para>
  A signature is a lossy representation of the indexed attribute(s), and as
  such is prone to reporting false positives; that is, it may be reported
  that an element is in the set, when it is not.  So index search results
  must always be rechecked using the actual attribute values from the heap
  entry.  Larger signatures reduce the odds of a false positive and thus
  reduce the number of useless heap visits, but of course also make the index
  larger and hence slower to scan.
 </para>

 <para>
  This type of index is most useful when a table has many attributes and
  queries test arbitrary combinations of them.  A traditional btree index is
  faster than a bloom index, but it can require many btree indexes to support
  all possible queries where one needs only a single bloom index.  Note
  however that bloom indexes only support equality queries, whereas btree
  indexes can also perform inequality and range searches.
 </para>

 <sect2 id="bloom-parameters">
  <title>Parameters</title>

  <para>
   A <literal>bloom</literal> index accepts the following parameters in its
   <literal>WITH</literal> clause:
  </para>

   <variablelist>
   <varlistentry>
    <term><literal>length</literal></term>
    <listitem>
     <para>
      Length of each signature (index entry) in bits. It is rounded up to the
      nearest multiple of <literal>16</literal>. The default is
      <literal>80</literal> bits and the maximum is <literal>4096</literal>.
     </para>
    </listitem>
   </varlistentry>
   </variablelist>
   <variablelist>
   <varlistentry>
    <term><literal>col1 &mdash; col32</literal></term>
    <listitem>
     <para>
      Number of bits generated for each index column. Each parameter's name
      refers to the number of the index column that it controls.  The default
      is <literal>2</literal> bits and the maximum is <literal>4095</literal>.
      Parameters for index columns not actually used are ignored.
     </para>
    </listitem>
   </varlistentry>
   </variablelist>
 </sect2>

 <sect2 id="bloom-examples">
  <title>Examples</title>

  <para>
   This is an example of creating a bloom index:
  </para>

<programlisting>
CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
       WITH (length=80, col1=2, col2=2, col3=4);
</programlisting>

  <para>
   The index is created with a signature length of 80 bits, with attributes
   i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.  We could
   have omitted the <literal>length</literal>, <literal>col1</literal>,
   and <literal>col2</literal> specifications since those have the default values.
  </para>

  <para>
   Here is a more complete example of bloom index definition and usage, as
   well as a comparison with equivalent btree indexes.  The bloom index is
   considerably smaller than the btree index, and can perform better.
  </para>

<programlisting>
=# CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,10000000);
SELECT 10000000
</programlisting>

  <para>
   A sequential scan over this large table takes a long time:
<programlisting>
=# EXPLAIN ANALYZE SELECT * FROM tbloom

Title: Bloom Index Access Method
Summary
The bloom index access method is based on Bloom filters, a space-efficient data structure for testing set membership, allowing fast exclusion of non-matching tuples, but requiring rechecks for false positives and supporting only equality queries.