Home Explore Blog CI



postgresql

16th chunk of `doc/src/sgml/textsearch.sgml`
4542e0f53f1f3950dfe3188d4cd2291eefbceadc726a27870000000100000fa2
 like the title or an initial abstract, so they can be
    treated with more or less importance than words in the document body.
   </para>

   <para>
    Since a longer document has a greater chance of containing a query term
    it is reasonable to take into account document size, e.g., a hundred-word
    document with five instances of a search word is probably more relevant
    than a thousand-word document with five instances.  Both ranking functions
    take an integer <replaceable>normalization</replaceable> option that
    specifies whether and how a document's length should impact its rank.
    The integer option controls several behaviors, so it is a bit mask:
    you can specify one or more behaviors using
    <literal>|</literal> (for example, <literal>2|4</literal>).

    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       0 (the default) ignores the document length
      </para>
     </listitem>
     <listitem>
      <para>
       1 divides the rank by 1 + the logarithm of the document length
      </para>
     </listitem>
     <listitem>
      <para>
       2 divides the rank by the document length
      </para>
     </listitem>
     <listitem>
      <para>
       4 divides the rank by the mean harmonic distance between extents
       (this is implemented only by <function>ts_rank_cd</function>)
      </para>
     </listitem>
     <listitem>
      <para>
       8 divides the rank by the number of unique words in document
      </para>
     </listitem>
     <listitem>
      <para>
       16 divides the rank by 1 + the logarithm of the number
       of unique words in document
      </para>
     </listitem>
     <listitem>
      <para>
       32 divides the rank by itself + 1
      </para>
     </listitem>
    </itemizedlist>

    If more than one flag bit is specified, the transformations are
    applied in the order listed.
   </para>

   <para>
    It is important to note that the ranking functions do not use any global
    information, so it is impossible to produce a fair normalization to 1% or
    100% as sometimes desired.  Normalization option 32
    (<literal>rank/(rank+1)</literal>) can be applied to scale all ranks
    into the range zero to one, but of course this is just a cosmetic change;
    it will not affect the ordering of the search results.
   </para>

   <para>
    Here is an example that selects only the ten highest-ranked matches:

<screen>
SELECT title, ts_rank_cd(textsearch, query) AS rank
FROM apod, to_tsquery('neutrino|(dark &amp; matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
                     title                     |   rank
-----------------------------------------------+----------
 Neutrinos in the Sun                          |      3.1
 The Sudbury Neutrino Detector                 |      2.4
 A MACHO View of Galactic Dark Matter          |  2.01317
 Hot Gas and Dark Matter                       |  1.91171
 The Virgo Cluster: Hot Plasma and Dark Matter |  1.90953
 Rafting for Solar Neutrinos                   |      1.9
 NGC 4650A: Strange Galaxy and Dark Matter     |  1.85774
 Hot Gas and Dark Matter                       |   1.6123
 Ice Fishing for Cosmic Neutrinos              |      1.6
 Weak Lensing Distorts the Universe            | 0.818218
</screen>

    This is the same example using normalized ranking:

<screen>
SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */ ) AS rank
FROM apod, to_tsquery('neutrino|(dark &amp; matter)') query
WHERE  query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
                     title                     |        rank
-----------------------------------------------+-------------------
 Neutrinos in the Sun                          | 0.756097569485493
 The Sudbury Neutrino Detector                 | 0.705882361190954
 A MACHO View of Galactic Dark Matter          | 0.668123210574724
 Hot Gas and Dark Matter                       |  0.65655958650282
 The Virgo Cluster: Hot

Title: Normalization options and examples for ts_rank and ts_rank_cd
Summary
This section details the normalization options for `ts_rank` and `ts_rank_cd` functions, which allow adjusting the rank based on document length or other factors. It lists the available options, including ignoring document length, dividing by the logarithm of document length, or using the harmonic distance between extents. It also explains that ranking functions don't use global information, making perfect normalization impossible. Finally, it provides example queries demonstrating the use of `ts_rank_cd` with and without normalization to retrieve the top-ranked matches from a table.