Bloom Index Performance Comparison

signature length of 80 bits, with attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could have omitted the <literal>length</literal>, <literal>col1</literal>, and <literal>col2</literal> specifications since those have the default values. </para> <para> Here is a more complete example of bloom index definition and usage, as well as a comparison with equivalent btree indexes. The bloom index is considerably smaller than the btree index, and can perform better. </para> <programlisting> =# CREATE TABLE tbloom AS SELECT (random() * 1000000)::int as i1, (random() * 1000000)::int as i2, (random() * 1000000)::int as i3, (random() * 1000000)::int as i4, (random() * 1000000)::int as i5, (random() * 1000000)::int as i6 FROM generate_series(1,10000000); SELECT 10000000 </programlisting> <para> A sequential scan over this large table takes a long time: <programlisting> =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN -------------------------------------------------------------------&zwsp;----------------------------------- Seq Scan on tbloom (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0.00 loops=1) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 10000000 Buffers: shared hit=63744 Planning Time: 0.346 ms Execution Time: 357.076 ms (6 rows) </programlisting> </para> <para> Even with the btree index defined the result will still be a sequential scan: <programlisting> =# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); CREATE INDEX =# SELECT pg_size_pretty(pg_relation_size('btreeidx')); pg_size_pretty ---------------- 386 MB (1 row) =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN -------------------------------------------------------------------&zwsp;----------------------------------- Seq Scan on tbloom (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0.00 loops=1) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 10000000 Buffers: shared hit=63744 Planning Time: 0.138 ms Execution Time: 351.035 ms (6 rows) </programlisting> </para> <para> Having the bloom index defined on the table is better than btree in handling this type of search: <programlisting> =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); CREATE INDEX =# SELECT pg_size_pretty(pg_relation_size('bloomidx')); pg_size_pretty ---------------- 153 MB (1 row) =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN -------------------------------------------------------------------&zwsp;-------------------------------------------------- Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0.00 loops=1) Recheck Cond: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Index Recheck: 2300 Heap Blocks: exact=2256 Buffers: shared hit=21864 -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300.00 loops=1) Index Cond: ((i2 = 898732) AND (i5 = 123451)) Index Searches: 1 Buffers: shared hit=19608 Planning Time: 0.099 ms Execution Time: 22.632 ms (11 rows) </programlisting> </para> <para> Now, the main problem with the btree search is that btree is inefficient when the search conditions do not constrain the leading index column(s). A better

A bloom index outperforms a btree index in handling searches with arbitrary combinations of attributes, as it can efficiently exclude non-matching tuples, resulting in faster query execution times and smaller index sizes, even when the search conditions do not constrain the leading index columns.