Home Explore Blog CI



postgresql

15th chunk of `doc/src/sgml/textsearch.sgml`
07a05155605649512cc70a6b5f2a2354b8d1527d68e248420000000100000fa2
   <variablelist>

     <varlistentry>

      <term>
       <indexterm>
        <primary>ts_rank</primary>
       </indexterm>

       <literal>ts_rank(<optional> <replaceable class="parameter">weights</replaceable> <type>float4[]</type>, </optional> <replaceable class="parameter">vector</replaceable> <type>tsvector</type>, <replaceable class="parameter">query</replaceable> <type>tsquery</type> <optional>, <replaceable class="parameter">normalization</replaceable> <type>integer</type> </optional>) returns <type>float4</type></literal>
      </term>

      <listitem>
       <para>
        Ranks vectors based on the frequency of their matching lexemes.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>

      <term>
      <indexterm>
       <primary>ts_rank_cd</primary>
      </indexterm>

       <literal>ts_rank_cd(<optional> <replaceable class="parameter">weights</replaceable> <type>float4[]</type>, </optional> <replaceable class="parameter">vector</replaceable> <type>tsvector</type>, <replaceable class="parameter">query</replaceable> <type>tsquery</type> <optional>, <replaceable class="parameter">normalization</replaceable> <type>integer</type> </optional>) returns <type>float4</type></literal>
      </term>

      <listitem>
       <para>
        This function computes the <firstterm>cover density</firstterm>
        ranking for the given document vector and query, as described in
        Clarke, Cormack, and Tudhope's "Relevance Ranking for One to Three
        Term Queries" in the journal "Information Processing and Management",
        1999.  Cover density is similar to <function>ts_rank</function> ranking
        except that the proximity of matching lexemes to each other is
        taken into consideration.
       </para>

       <para>
        This function requires lexeme positional information to perform
        its calculation.  Therefore, it ignores any <quote>stripped</quote>
        lexemes in the <type>tsvector</type>.  If there are no unstripped
        lexemes in the input, the result will be zero.  (See <xref
        linkend="textsearch-manipulate-tsvector"/> for more information
        about the <function>strip</function> function and positional information
        in <type>tsvector</type>s.)
       </para>
      </listitem>
     </varlistentry>

    </variablelist>

   </para>

   <para>
    For both these functions,
    the optional <replaceable class="parameter">weights</replaceable>
    argument offers the ability to weigh word instances more or less
    heavily depending on how they are labeled.  The weight arrays specify
    how heavily to weigh each category of word, in the order:

<synopsis>
{D-weight, C-weight, B-weight, A-weight}
</synopsis>

    If no <replaceable class="parameter">weights</replaceable> are provided,
    then these defaults are used:

<programlisting>
{0.1, 0.2, 0.4, 1.0}
</programlisting>

    Typically weights are used to mark words from special areas of the
    document, like the title or an initial abstract, so they can be
    treated with more or less importance than words in the document body.
   </para>

   <para>
    Since a longer document has a greater chance of containing a query term
    it is reasonable to take into account document size, e.g., a hundred-word
    document with five instances of a search word is probably more relevant
    than a thousand-word document with five instances.  Both ranking functions
    take an integer <replaceable>normalization</replaceable> option that
    specifies whether and how a document's length should impact its rank.
    The integer option controls several behaviors, so it is a bit mask:
    you can specify one or more behaviors using
    <literal>|</literal> (for example, <literal>2|4</literal>).

    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       0 (the default) ignores the document length
      </para>
     </listitem>
     <listitem>
      <para>
       1 divides the

Title: ts_rank and ts_rank_cd ranking functions
Summary
This section describes the `ts_rank` and `ts_rank_cd` functions used for ranking search results in PostgreSQL. `ts_rank` ranks vectors based on the frequency of matching lexemes. `ts_rank_cd` computes the cover density ranking, taking into account the proximity of matching lexemes. Both functions accept optional weights to emphasize certain word categories. The section also explains the normalization option, which allows considering document length when calculating the rank.