Home Explore Blog CI



postgresql

6th chunk of `doc/src/sgml/intarray.sgml`
5bab33b4fc46b97e2a2b7f60b8604d1dd7b6bb37cbe7e3c60000000100000d8e
 <literal>gist__intbig_ops</literal> uses a larger signature and is more
   suitable for indexing large data sets (i.e., columns containing
   a large number of distinct array values).
   The implementation uses an RD-tree data structure with
   built-in lossy compression.
  </para>

  <para>
   <literal>gist__int_ops</literal> approximates an integer set as an array of
   integer ranges.  Its optional integer parameter <literal>numranges</literal>
   determines the maximum number of ranges in
   one index key.  The default value of <literal>numranges</literal> is 100.
   Valid values are between 1 and 253.  Using larger arrays as GiST index
   keys leads to a more precise search (scanning a smaller fraction of the index and
   fewer heap pages), at the cost of a larger index.
  </para>

  <para>
   <literal>gist__intbig_ops</literal> approximates an integer set as a bitmap
   signature.  Its optional integer parameter <literal>siglen</literal>
   determines the signature length in bytes.
   The default signature length is 16 bytes.  Valid values of signature length
   are between 1 and 2024 bytes.  Longer signatures lead to a more precise
   search (scanning a smaller fraction of the index and fewer heap pages), at
   the cost of a larger index.
  </para>

  <para>
   There is also a non-default GIN operator class
   <literal>gin__int_ops</literal>, which supports these operators as well
   as <literal>&lt;@</literal>.
  </para>

  <para>
   The choice between GiST and GIN indexing depends on the relative
   performance characteristics of GiST and GIN, which are discussed elsewhere.
  </para>
 </sect2>

 <sect2 id="intarray-example">
  <title>Example</title>

<programlisting>
-- a message can be in one or more <quote>sections</quote>
CREATE TABLE message (mid INT PRIMARY KEY, sections INT[], ...);

-- create specialized index with signature length of 32 bytes
CREATE INDEX message_rdtree_idx ON message USING GIST (sections gist__intbig_ops (siglen = 32));

-- select messages in section 1 OR 2 - OVERLAP operator
SELECT message.mid FROM message WHERE message.sections &amp;&amp; '{1,2}';

-- select messages in sections 1 AND 2 - CONTAINS operator
SELECT message.mid FROM message WHERE message.sections @&gt; '{1,2}';

-- the same, using QUERY operator
SELECT message.mid FROM message WHERE message.sections @@ '1&amp;2'::query_int;
</programlisting>
 </sect2>

 <sect2 id="intarray-benchmark">
  <title>Benchmark</title>

  <para>
   The source directory <filename>contrib/intarray/bench</filename> contains a
   benchmark test suite, which can be run against an installed
   <productname>PostgreSQL</productname> server.  (It also requires <filename>DBD::Pg</filename>
   to be installed.)  To run:
  </para>

<programlisting>
cd .../contrib/intarray/bench
createdb TEST
psql -c "CREATE EXTENSION intarray" TEST
./create_test.pl | psql TEST
./bench.pl
</programlisting>

  <para>
   The <filename>bench.pl</filename> script has numerous options, which
   are displayed when it is run without any arguments.
  </para>
 </sect2>

 <sect2 id="intarray-Authors">
  <title>Authors</title>

  <para>
   All work was done by Teodor Sigaev (<email>teodor@sigaev.ru</email>) and
   Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
   <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink> for
   additional information. Andrey Oktyabrski did a great work on adding new
   functions and operations.
  </para>
 </sect2>

</sect1>

Title: Integer Array Indexing and Operations in PostgreSQL
Summary
This section details the intarray module in PostgreSQL, which provides enhanced functionality for integer arrays. It describes two GiST index operator classes: gist__int_ops for small to medium datasets, and gist__intbig_ops for larger datasets. Both use RD-tree structures with lossy compression. gist__int_ops approximates integer sets as arrays of ranges, while gist__intbig_ops uses bitmap signatures. The section also covers a GIN operator class, gin__int_ops. It includes an example of creating and using these indexes, as well as information on running benchmark tests. The module was developed by Teodor Sigaev and Oleg Bartunov, with contributions from Andrey Oktyabrski.