like the title or an initial abstract, so they can be
treated with more or less importance than words in the document body.
</para>
<para>
Since a longer document has a greater chance of containing a query term
it is reasonable to take into account document size, e.g., a hundred-word
document with five instances of a search word is probably more relevant
than a thousand-word document with five instances. Both ranking functions
take an integer <replaceable>normalization</replaceable> option that
specifies whether and how a document's length should impact its rank.
The integer option controls several behaviors, so it is a bit mask:
you can specify one or more behaviors using
<literal>|</literal> (for example, <literal>2|4</literal>).
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
0 (the default) ignores the document length
</para>
</listitem>
<listitem>
<para>
1 divides the rank by 1 + the logarithm of the document length
</para>
</listitem>
<listitem>
<para>
2 divides the rank by the document length
</para>
</listitem>
<listitem>
<para>
4 divides the rank by the mean harmonic distance between extents
(this is implemented only by <function>ts_rank_cd</function>)
</para>
</listitem>
<listitem>
<para>
8 divides the rank by the number of unique words in document
</para>
</listitem>
<listitem>
<para>
16 divides the rank by 1 + the logarithm of the number
of unique words in document
</para>
</listitem>
<listitem>
<para>
32 divides the rank by itself + 1
</para>
</listitem>
</itemizedlist>
If more than one flag bit is specified, the transformations are
applied in the order listed.
</para>
<para>
It is important to note that the ranking functions do not use any global
information, so it is impossible to produce a fair normalization to 1% or
100% as sometimes desired. Normalization option 32
(<literal>rank/(rank+1)</literal>) can be applied to scale all ranks
into the range zero to one, but of course this is just a cosmetic change;
it will not affect the ordering of the search results.
</para>
<para>
Here is an example that selects only the ten highest-ranked matches:
<screen>
SELECT title, ts_rank_cd(textsearch, query) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
title | rank
-----------------------------------------------+----------
Neutrinos in the Sun | 3.1
The Sudbury Neutrino Detector | 2.4
A MACHO View of Galactic Dark Matter | 2.01317
Hot Gas and Dark Matter | 1.91171
The Virgo Cluster: Hot Plasma and Dark Matter | 1.90953
Rafting for Solar Neutrinos | 1.9
NGC 4650A: Strange Galaxy and Dark Matter | 1.85774
Hot Gas and Dark Matter | 1.6123
Ice Fishing for Cosmic Neutrinos | 1.6
Weak Lensing Distorts the Universe | 0.818218
</screen>
This is the same example using normalized ranking:
<screen>
SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */ ) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
title | rank
-----------------------------------------------+-------------------
Neutrinos in the Sun | 0.756097569485493
The Sudbury Neutrino Detector | 0.705882361190954
A MACHO View of Galactic Dark Matter | 0.668123210574724
Hot Gas and Dark Matter | 0.65655958650282
The Virgo Cluster: Hot