Home Explore Blog CI



postgresql

2nd chunk of `doc/src/sgml/bloom.sgml`
d6b48d6971e0cb1fe80e1273f0a207a04e94127a19ae90610000000100000e86
 signature length of 80 bits, with attributes
   i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.  We could
   have omitted the <literal>length</literal>, <literal>col1</literal>,
   and <literal>col2</literal> specifications since those have the default values.
  </para>

  <para>
   Here is a more complete example of bloom index definition and usage, as
   well as a comparison with equivalent btree indexes.  The bloom index is
   considerably smaller than the btree index, and can perform better.
  </para>

<programlisting>
=# CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,10000000);
SELECT 10000000
</programlisting>

  <para>
   A sequential scan over this large table takes a long time:
<programlisting>
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
-------------------------------------------------------------------&zwsp;-----------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0.00 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
   Buffers: shared hit=63744
 Planning Time: 0.346 ms
 Execution Time: 357.076 ms
(6 rows)
</programlisting>
  </para>

  <para>
   Even with the btree index defined the result will still be a
   sequential scan:
<programlisting>
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
 pg_size_pretty
----------------
 386 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
-------------------------------------------------------------------&zwsp;-----------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0.00 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
   Buffers: shared hit=63744
 Planning Time: 0.138 ms
 Execution Time: 351.035 ms
(6 rows)
</programlisting>
  </para>

  <para>
   Having the bloom index defined on the table is better than btree in
   handling this type of search:
<programlisting>
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
 pg_size_pretty
----------------
 153 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                     QUERY PLAN
-------------------------------------------------------------------&zwsp;--------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0.00 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2300
   Heap Blocks: exact=2256
   Buffers: shared hit=21864
   -&gt;  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300.00 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
         Index Searches: 1
         Buffers: shared hit=19608
 Planning Time: 0.099 ms
 Execution Time: 22.632 ms
(11 rows)
</programlisting>
  </para>

  <para>
   Now, the main problem with the btree search is that btree is inefficient
   when the search conditions do not constrain the leading index column(s).
   A better

Title: Bloom Index Performance Comparison
Summary
A bloom index outperforms a btree index in handling searches with arbitrary combinations of attributes, as it can efficiently exclude non-matching tuples, resulting in faster query execution times and smaller index sizes, even when the search conditions do not constrain the leading index columns.