ts_rank and ts_rank_cd ranking functions

<variablelist> <varlistentry> <term> <indexterm> <primary>ts_rank</primary> </indexterm> <literal>ts_rank(<optional> <replaceable class="parameter">weights</replaceable> <type>float4[]</type>, </optional> <replaceable class="parameter">vector</replaceable> <type>tsvector</type>, <replaceable class="parameter">query</replaceable> <type>tsquery</type> <optional>, <replaceable class="parameter">normalization</replaceable> <type>integer</type> </optional>) returns <type>float4</type></literal> </term> <listitem> <para> Ranks vectors based on the frequency of their matching lexemes. </para> </listitem> </varlistentry> <varlistentry> <term> <indexterm> <primary>ts_rank_cd</primary> </indexterm> <literal>ts_rank_cd(<optional> <replaceable class="parameter">weights</replaceable> <type>float4[]</type>, </optional> <replaceable class="parameter">vector</replaceable> <type>tsvector</type>, <replaceable class="parameter">query</replaceable> <type>tsquery</type> <optional>, <replaceable class="parameter">normalization</replaceable> <type>integer</type> </optional>) returns <type>float4</type></literal> </term> <listitem> <para> This function computes the <firstterm>cover density</firstterm> ranking for the given document vector and query, as described in Clarke, Cormack, and Tudhope's "Relevance Ranking for One to Three Term Queries" in the journal "Information Processing and Management", 1999. Cover density is similar to <function>ts_rank</function> ranking except that the proximity of matching lexemes to each other is taken into consideration. </para> <para> This function requires lexeme positional information to perform its calculation. Therefore, it ignores any <quote>stripped</quote> lexemes in the <type>tsvector</type>. If there are no unstripped lexemes in the input, the result will be zero. (See <xref linkend="textsearch-manipulate-tsvector"/> for more information about the <function>strip</function> function and positional information in <type>tsvector</type>s.) </para> </listitem> </varlistentry> </variablelist> </para> <para> For both these functions, the optional <replaceable class="parameter">weights</replaceable> argument offers the ability to weigh word instances more or less heavily depending on how they are labeled. The weight arrays specify how heavily to weigh each category of word, in the order: <synopsis> {D-weight, C-weight, B-weight, A-weight} </synopsis> If no <replaceable class="parameter">weights</replaceable> are provided, then these defaults are used: <programlisting> {0.1, 0.2, 0.4, 1.0} </programlisting> Typically weights are used to mark words from special areas of the document, like the title or an initial abstract, so they can be treated with more or less importance than words in the document body. </para> <para> Since a longer document has a greater chance of containing a query term it is reasonable to take into account document size, e.g., a hundred-word document with five instances of a search word is probably more relevant than a thousand-word document with five instances. Both ranking functions take an integer <replaceable>normalization</replaceable> option that specifies whether and how a document's length should impact its rank. The integer option controls several behaviors, so it is a bit mask: you can specify one or more behaviors using <literal>|</literal> (for example, <literal>2|4</literal>). <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> 0 (the default) ignores the document length </para> </listitem> <listitem> <para> 1 divides the

This section describes the `ts_rank` and `ts_rank_cd` functions used for ranking search results in PostgreSQL. `ts_rank` ranks vectors based on the frequency of matching lexemes. `ts_rank_cd` computes the cover density ranking, taking into account the proximity of matching lexemes. Both functions accept optional weights to emphasize certain word categories. The section also explains the normalization option, which allows considering document length when calculating the rank.